pubchem

Note

This module has the following additional requirements:

pandas>=1.0.1
beautifulsoup4>=4.7.0

These can be installed as follows:

$ python -m pip install chemistry_tools[pubchem]

atom

class chemistry_tools.pubchem.atom.Atom(aid, number, x=None, y=None, z=None, charge=0)[source]

Class to represent an atom in a Compound.

coordinate_type

Whether this atom has 2D or 3D coordinates.

element

The element symbol for this atom.

set_coordinates(x, y, z=None)[source]

Set all coordinate dimensions at once.

to_dict()[source]

Return a dictionary containing Atom data.

bond

class chemistry_tools.pubchem.bond.Bond(aid1, aid2, order=1, style=None)[source]

Class to represent a bond between two atoms in a Compound.

to_dict()[source]

Return a dictionary containing Bond data.

class chemistry_tools.pubchem.bond.BondType[source]
COMPLEX = 6
DATIVE = 5
DOUBLE = 2
IONIC = 7
QUADRUPLE = 4
SINGLE = 1
TRIPLE = 3
UNKNOWN = 255

compound

class chemistry_tools.pubchem.compound.Compound(record)[source]

Corresponds to a single record from the PubChem Compound database.

The PubChem Compound database is constructed from the Substance database using a standardization and deduplication process. Each Compound is uniquely identified by a CID.

aids

Requires an extra request. Result is cached.

atom_stereo_count

Atom stereocenter count.

atoms

List of Atoms in this Compound.

boiling_point

Boiling Point

bond_stereo_count

Bond stereocenter count.

bonds

List of Bonds between Atoms in this Compound.

cactvs_fingerprint

PubChem CACTVS fingerprint.

Each bit in the fingerprint represents the presence or absence of one of 881 chemical substructures.

More information at ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt

canonical_smiles

Canonical SMILES, with no stereochemistry information.

charge

Formal charge on this Compound.

cid

The PubChem Compound Identifier (CID).

Note

When searching using a SMILES or InChI query that is not present in the PubChem Compound database, an automatically generated record may be returned that contains properties that have been calculated on the fly. These records will not have a CID property.

color

Color/Form

complexity

Complexity.

conformer_id_3d
conformer_rmsd_3d
coordinate_type
covalent_unit_count

Covalently-bonded unit count.

defined_atom_stereo_count

Defined atom stereocenter count.

defined_bond_stereo_count

Defined bond stereocenter count.

density

Density/Specific Gravity

dissociation_constant

Dissociation Constants

effective_rotor_count_3d
elements

List of element symbols for atoms in this Compound.

exact_mass

Exact mass.

feature_selfoverlap_3d
fingerprint

Raw padded and hex-encoded fingerprint, as returned by the PUG REST API.

static format_string(stringwithmarkup)[source]
classmethod from_cid(cid, **kwargs)[source]

Retrieve the Compound record for the specified CID.

Usage:

c = Compound.from_cid(6819)
Parameters:cid (int) – The PubChem Compound Identifier (CID).
full_record
get_property(property)[source]
get_property_description(property)[source]
get_property_unit(property)[source]
get_property_value(property)[source]
h_bond_acceptor_count

Hydrogen bond acceptor count.

h_bond_donor_count

Hydrogen bond donor count.

heat_combustion

Heat of Combustion

heavy_atom_count

Heavy atom count.

hill_formula
inchi

InChI string.

inchikey

InChIKey.

is_canonicalized

Compound Is Canonicalized

isomeric_smiles

Isomeric SMILES.

isotope_atom_count

Isotope atom count.

iupac_name

Preferred IUPAC name.

melting_point

Melting Point

mmff94_energy_3d
mmff94_partial_charges_3d
molecular_formula

Molecular formula.

molecular_mass

Molecular Mass.

molecular_weight

Molecular Weight.

monoisotopic_mass

Monoisotopic mass.

multipoles_3d
odor

Odor

other_props

Other Chemical/Physical Properties

partition_coeff

Octanol/Water Partition Coefficient

pharmacophore_features_3d
record

The raw compound record returned by the PubChem PUG REST service.

rotatable_bond_count

Rotatable bond count.

shape_fingerprint_3d
shape_selfoverlap_3d
sids

Requires an extra request. Result is cached.

smiles

Canonical SMILES, with no stereochemistry information.

solubility

Solubility

specific_gravity

Density/Specific Gravity

spectral_props

Spectral Properties

surface_tension

Surface Tension

systematic_name

Systematic IUPAC name.

to_dict(properties=None)[source]

Return a dictionary containing Compound data. Optionally specify a list of the desired properties.

synonyms, aids and sids are not included unless explicitly specified using the properties parameter. This is because they each require an extra request.

to_series(properties=None)[source]

Return a pandas Series containing Compound data. Optionally specify a list of the desired properties.

synonyms, aids and sids are not included unless explicitly specified using the properties parameter. This is because they each require an extra request.

tpsa

Topological Polar Surface Area.

undefined_atom_stereo_count

Undefined atom stereocenter count.

undefined_bond_stereo_count

Undefined bond stereocenter count.

vapor_density

Vapor Density

vapor_pressure

Vapor Pressure

volume_3d
xlogp

XLogP.

class chemistry_tools.pubchem.compound.CompoundIdType[source]
COMPONENT = 2

Component of the Standardized Form

DEPOSITED = 0

Original Deposited Compound

IONIZED = 6

Ionized pKa Form of the Standardized Form

MIXTURE = 4

Deposited Mixture Component

NEUTRALIZED = 3

Neutralized Form of the Standardized Form

STANDARDIZED = 1

Standardized Form of the Deposited Compound

TAUTOMER = 5

Alternate Tautomer Form of the Standardized Form

UNKNOWN = 255

Unspecified or Unknown Compound Type

chemistry_tools.pubchem.compound.compounds_to_frame(compounds, properties=None)[source]

Construct a pandas DataFrame from a list of Compound objects.

Optionally specify a list of the desired Compound properties.

errors

Error handling functions

exception chemistry_tools.pubchem.errors.BadRequestError(msg='Request is improperly formed')[source]

Request is improperly formed (syntax error in the URL, POST body, etc.).

exception chemistry_tools.pubchem.errors.MethodNotAllowedError(msg='Request not allowed')[source]

Request not allowed (such as invalid MIME type in the HTTP Accept header).

exception chemistry_tools.pubchem.errors.NotFoundError(msg='The input record was not found')[source]

The input record was not found (e.g. invalid CID).

exception chemistry_tools.pubchem.errors.PubChemHTTPError(e)[source]

Generic error class to handle all HTTP error codes.

exception chemistry_tools.pubchem.errors.PubChemPyDeprecationWarning[source]

Warning category for deprecated features.

exception chemistry_tools.pubchem.errors.PubChemPyError[source]

Base class for all PubChemPy exceptions.

exception chemistry_tools.pubchem.errors.ResponseParseError[source]

PubChem response is uninterpretable.

exception chemistry_tools.pubchem.errors.ServerError(msg='Some problem on the server side')[source]

Some problem on the server side (such as a database server down, etc.).

exception chemistry_tools.pubchem.errors.TimeoutError(msg='The request timed out')[source]

The request timed out, from server overload or too broad a request.

See Avoiding TimeoutError for more information.

exception chemistry_tools.pubchem.errors.UnimplementedError(msg='The requested operation has not been implemented')[source]

The requested operation has not (yet) been implemented by the server.

chemistry_tools.pubchem.errors.deprecated(message=None)[source]

Decorator to mark functions as deprecated. A warning will be emitted when the function is used.

substance

class chemistry_tools.pubchem.substance.Substance(record)[source]

Corresponds to a single record from the PubChem Substance database.

The PubChem Substance database contains all chemical records deposited in PubChem in their most raw form, before any significant processing is applied. As a result, it contains duplicates, mixtures, and some records that don’t make chemical sense. This means that Substance records contain fewer calculated properties, however they do have additional information about the original source that deposited the record.

The PubChem Compound database is constructed from the Substance database using a standardization and deduplication process. Hence each Compound may be derived from a number of different Substances.

aids

A list of all AIDs for Assays associated with this Substance.

Requires an extra request. Result is cached.

cids

A list of all CIDs for Compounds that were produced when this Substance was standardized.

Requires an extra request. Result is cached.

deposited_compound

Return a Compound produced from the unstandardized Substance record as deposited.

The resulting Compound will not have a cid and will be missing most properties.

classmethod from_sid(sid)[source]

Retrieve the Substance record for the specified SID.

Parameters:sid (int) – The PubChem Substance Identifier (SID).
sid

The PubChem Substance Idenfitier (SID).

source_id

Unique ID for this Substance within those from the same PubChem depositor source.

source_name

The name of the PubChem depositor that was the source of this Substance.

standardized_cid

The CID of the Compound that was produced when this Substance was standardized.

May not exist if this Substance was not standardizable.

standardized_compound

Return the Compound that was produced when this Substance was standardized.

Requires an extra request. Result is cached.

synonyms

A ranked list of all the names associated with this Substance.

to_dict(properties=None)[source]

Return a dictionary containing Substance data.

If the properties parameter is not specified, everything except cids and aids is included. This is because the aids and cids properties each require an extra request to retrieve.

Parameters:properties – (optional) A list of the desired properties.
to_series(properties=None)[source]

Return a pandas Series containing Substance data.

If the properties parameter is not specified, everything except cids and aids is included. This is because the aids and cids properties each require an extra request to retrieve.

Parameters:properties – (optional) A list of the desired properties.
chemistry_tools.pubchem.substance.substances_to_frame(substances, properties=None)[source]

Construct a pandas DataFrame from a list of Substance objects.

Optionally specify a list of the desired Substance properties.

utils

Various tools

chemistry_tools.pubchem.utils.download(outformat, path, identifier, namespace='cid', domain='compound', operation=None, searchtype=None, overwrite=False, **kwargs)[source]

Format can be XML, ASNT/B, JSON, SDF, CSV, PNG, TXT.

chemistry_tools.pubchem.utils.format_string(stringwithmarkup)[source]

Convert a PubChem formatted string into an HTML formatted string

chemistry_tools.pubchem.utils.get(identifier, namespace='cid', domain='compound', operation=None, output='JSON', searchtype=None, **kwargs)[source]

Request wrapper that automatically handles async requests.

chemistry_tools.pubchem.utils.get_aids(identifier, namespace='cid', domain='compound', searchtype=None, **kwargs)[source]
chemistry_tools.pubchem.utils.get_all_sources(domain='substance')[source]

Return a list of all current depositors of substances or assays.

chemistry_tools.pubchem.utils.get_cids(identifier, namespace='name', domain='compound', searchtype=None, **kwargs)[source]
chemistry_tools.pubchem.utils.get_full_json(cid)[source]
chemistry_tools.pubchem.utils.get_json(identifier, namespace='cid', domain='compound', operation=None, searchtype=None, **kwargs)[source]

Request wrapper that automatically parses JSON response and suppresses NotFoundError.

chemistry_tools.pubchem.utils.get_properties(properties, identifier, namespace='cid', searchtype=None, as_dataframe=False, **kwargs)[source]

Retrieve the specified properties from PubChem.

Parameters:
  • identifier – The compound, substance or assay identifier to use as a search query.
  • namespace – (optional) The identifier type.
  • searchtype – (optional) The advanced search type, one of substructure, superstructure or similarity.
  • as_dataframe – (optional) Automatically extract the properties into a pandas DataFrame.
chemistry_tools.pubchem.utils.get_sdf(identifier, namespace='cid', domain='compound', operation=None, searchtype=None, **kwargs)[source]

Request wrapper that automatically parses SDF response and suppresses NotFoundError.

chemistry_tools.pubchem.utils.get_sids(identifier, namespace='cid', domain='compound', searchtype=None, **kwargs)[source]
chemistry_tools.pubchem.utils.get_synonyms(identifier, namespace='cid', domain='compound', searchtype=None, **kwargs)[source]
chemistry_tools.pubchem.utils.memoized_property(fget)[source]

Decorator to create memoized properties.

Used to cache Compound and Substance properties that require an additional request.

chemistry_tools.pubchem.utils.request(identifier, namespace='cid', domain='compound', operation=None, output='JSON', searchtype=None, **kwargs)[source]

Construct API request from parameters and return the response.

Full specification at http://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST.html