pubchem
¶
Note
This module has the following additional requirements:
pandas>=1.0.1
beautifulsoup4>=4.7.0
These can be installed as follows:
$ python -m pip install chemistry_tools[pubchem]
atom
¶
bond
¶
compound
¶
-
class
chemistry_tools.pubchem.compound.
Compound
(record)[source]¶ Corresponds to a single record from the PubChem Compound database.
The PubChem Compound database is constructed from the Substance database using a standardization and deduplication process. Each Compound is uniquely identified by a CID.
-
aids
¶ Requires an extra request. Result is cached.
-
atom_stereo_count
¶ Atom stereocenter count.
-
atoms
¶ List of
Atoms
in this Compound.
-
boiling_point
¶ Boiling Point
-
bond_stereo_count
¶ Bond stereocenter count.
-
bonds
¶ List of
Bonds
betweenAtoms
in this Compound.
-
cactvs_fingerprint
¶ PubChem CACTVS fingerprint.
Each bit in the fingerprint represents the presence or absence of one of 881 chemical substructures.
More information at ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt
-
canonical_smiles
¶ Canonical SMILES, with no stereochemistry information.
-
charge
¶ Formal charge on this Compound.
-
cid
¶ The PubChem Compound Identifier (CID).
Note
When searching using a SMILES or InChI query that is not present in the PubChem Compound database, an automatically generated record may be returned that contains properties that have been calculated on the fly. These records will not have a CID property.
-
color
¶ Color/Form
-
complexity
¶ Complexity.
-
conformer_id_3d
¶
-
conformer_rmsd_3d
¶
-
coordinate_type
¶
-
covalent_unit_count
¶ Covalently-bonded unit count.
-
defined_atom_stereo_count
¶ Defined atom stereocenter count.
-
defined_bond_stereo_count
¶ Defined bond stereocenter count.
-
density
¶ Density/Specific Gravity
-
dissociation_constant
¶ Dissociation Constants
-
effective_rotor_count_3d
¶
-
elements
¶ List of element symbols for atoms in this Compound.
-
exact_mass
¶ Exact mass.
-
feature_selfoverlap_3d
¶
-
fingerprint
¶ Raw padded and hex-encoded fingerprint, as returned by the PUG REST API.
-
classmethod
from_cid
(cid, **kwargs)[source]¶ Retrieve the Compound record for the specified CID.
Usage:
c = Compound.from_cid(6819)
Parameters: cid (int) – The PubChem Compound Identifier (CID).
-
full_record
¶
-
h_bond_acceptor_count
¶ Hydrogen bond acceptor count.
-
h_bond_donor_count
¶ Hydrogen bond donor count.
-
heat_combustion
¶ Heat of Combustion
-
heavy_atom_count
¶ Heavy atom count.
-
hill_formula
¶
-
inchi
¶ InChI string.
-
inchikey
¶ InChIKey.
-
is_canonicalized
¶ Compound Is Canonicalized
-
isomeric_smiles
¶ Isomeric SMILES.
-
isotope_atom_count
¶ Isotope atom count.
-
iupac_name
¶ Preferred IUPAC name.
-
melting_point
¶ Melting Point
-
mmff94_energy_3d
¶
-
mmff94_partial_charges_3d
¶
-
molecular_formula
¶ Molecular formula.
-
molecular_mass
¶ Molecular Mass.
-
molecular_weight
¶ Molecular Weight.
-
monoisotopic_mass
¶ Monoisotopic mass.
-
multipoles_3d
¶
-
odor
¶ Odor
-
other_props
¶ Other Chemical/Physical Properties
-
partition_coeff
¶ Octanol/Water Partition Coefficient
-
pharmacophore_features_3d
¶
-
record
¶ The raw compound record returned by the PubChem PUG REST service.
-
rotatable_bond_count
¶ Rotatable bond count.
-
shape_fingerprint_3d
¶
-
shape_selfoverlap_3d
¶
-
sids
¶ Requires an extra request. Result is cached.
-
smiles
¶ Canonical SMILES, with no stereochemistry information.
-
solubility
¶ Solubility
-
specific_gravity
¶ Density/Specific Gravity
-
spectral_props
¶ Spectral Properties
-
surface_tension
¶ Surface Tension
-
systematic_name
¶ Systematic IUPAC name.
-
to_dict
(properties=None)[source]¶ Return a dictionary containing Compound data. Optionally specify a list of the desired properties.
synonyms, aids and sids are not included unless explicitly specified using the properties parameter. This is because they each require an extra request.
-
to_series
(properties=None)[source]¶ Return a pandas
Series
containing Compound data. Optionally specify a list of the desired properties.synonyms, aids and sids are not included unless explicitly specified using the properties parameter. This is because they each require an extra request.
-
tpsa
¶ Topological Polar Surface Area.
-
undefined_atom_stereo_count
¶ Undefined atom stereocenter count.
-
undefined_bond_stereo_count
¶ Undefined bond stereocenter count.
-
vapor_density
¶ Vapor Density
-
vapor_pressure
¶ Vapor Pressure
-
volume_3d
¶
-
xlogp
¶ XLogP.
-
-
class
chemistry_tools.pubchem.compound.
CompoundIdType
[source]¶ -
COMPONENT
= 2¶ Component of the Standardized Form
-
DEPOSITED
= 0¶ Original Deposited Compound
-
IONIZED
= 6¶ Ionized pKa Form of the Standardized Form
-
MIXTURE
= 4¶ Deposited Mixture Component
-
NEUTRALIZED
= 3¶ Neutralized Form of the Standardized Form
-
STANDARDIZED
= 1¶ Standardized Form of the Deposited Compound
-
TAUTOMER
= 5¶ Alternate Tautomer Form of the Standardized Form
-
UNKNOWN
= 255¶ Unspecified or Unknown Compound Type
-
errors
¶
Error handling functions
-
exception
chemistry_tools.pubchem.errors.
BadRequestError
(msg='Request is improperly formed')[source]¶ Request is improperly formed (syntax error in the URL, POST body, etc.).
-
exception
chemistry_tools.pubchem.errors.
MethodNotAllowedError
(msg='Request not allowed')[source]¶ Request not allowed (such as invalid MIME type in the HTTP Accept header).
-
exception
chemistry_tools.pubchem.errors.
NotFoundError
(msg='The input record was not found')[source]¶ The input record was not found (e.g. invalid CID).
-
exception
chemistry_tools.pubchem.errors.
PubChemHTTPError
(e)[source]¶ Generic error class to handle all HTTP error codes.
-
exception
chemistry_tools.pubchem.errors.
PubChemPyDeprecationWarning
[source]¶ Warning category for deprecated features.
-
exception
chemistry_tools.pubchem.errors.
PubChemPyError
[source]¶ Base class for all PubChemPy exceptions.
-
exception
chemistry_tools.pubchem.errors.
ResponseParseError
[source]¶ PubChem response is uninterpretable.
-
exception
chemistry_tools.pubchem.errors.
ServerError
(msg='Some problem on the server side')[source]¶ Some problem on the server side (such as a database server down, etc.).
-
exception
chemistry_tools.pubchem.errors.
TimeoutError
(msg='The request timed out')[source]¶ The request timed out, from server overload or too broad a request.
See Avoiding TimeoutError for more information.
substance
¶
-
class
chemistry_tools.pubchem.substance.
Substance
(record)[source]¶ Corresponds to a single record from the PubChem Substance database.
The PubChem Substance database contains all chemical records deposited in PubChem in their most raw form, before any significant processing is applied. As a result, it contains duplicates, mixtures, and some records that don’t make chemical sense. This means that Substance records contain fewer calculated properties, however they do have additional information about the original source that deposited the record.
The PubChem Compound database is constructed from the Substance database using a standardization and deduplication process. Hence each Compound may be derived from a number of different Substances.
-
aids
¶ A list of all AIDs for Assays associated with this Substance.
Requires an extra request. Result is cached.
-
cids
¶ A list of all CIDs for Compounds that were produced when this Substance was standardized.
Requires an extra request. Result is cached.
-
deposited_compound
¶ Return a
Compound
produced from the unstandardized Substance record as deposited.The resulting
Compound
will not have acid
and will be missing most properties.
-
classmethod
from_sid
(sid)[source]¶ Retrieve the Substance record for the specified SID.
Parameters: sid (int) – The PubChem Substance Identifier (SID).
-
sid
¶ The PubChem Substance Idenfitier (SID).
-
source_id
¶ Unique ID for this Substance within those from the same PubChem depositor source.
-
source_name
¶ The name of the PubChem depositor that was the source of this Substance.
-
standardized_cid
¶ The CID of the Compound that was produced when this Substance was standardized.
May not exist if this Substance was not standardizable.
-
standardized_compound
¶ Return the
Compound
that was produced when this Substance was standardized.Requires an extra request. Result is cached.
-
synonyms
¶ A ranked list of all the names associated with this Substance.
-
to_dict
(properties=None)[source]¶ Return a dictionary containing Substance data.
If the properties parameter is not specified, everything except cids and aids is included. This is because the aids and cids properties each require an extra request to retrieve.
Parameters: properties – (optional) A list of the desired properties.
-
to_series
(properties=None)[source]¶ Return a pandas
Series
containing Substance data.If the properties parameter is not specified, everything except cids and aids is included. This is because the aids and cids properties each require an extra request to retrieve.
Parameters: properties – (optional) A list of the desired properties.
-
utils
¶
Various tools
-
chemistry_tools.pubchem.utils.
download
(outformat, path, identifier, namespace='cid', domain='compound', operation=None, searchtype=None, overwrite=False, **kwargs)[source]¶ Format can be XML, ASNT/B, JSON, SDF, CSV, PNG, TXT.
-
chemistry_tools.pubchem.utils.
format_string
(stringwithmarkup)[source]¶ Convert a PubChem formatted string into an HTML formatted string
-
chemistry_tools.pubchem.utils.
get
(identifier, namespace='cid', domain='compound', operation=None, output='JSON', searchtype=None, **kwargs)[source]¶ Request wrapper that automatically handles async requests.
-
chemistry_tools.pubchem.utils.
get_aids
(identifier, namespace='cid', domain='compound', searchtype=None, **kwargs)[source]¶
-
chemistry_tools.pubchem.utils.
get_all_sources
(domain='substance')[source]¶ Return a list of all current depositors of substances or assays.
-
chemistry_tools.pubchem.utils.
get_cids
(identifier, namespace='name', domain='compound', searchtype=None, **kwargs)[source]¶
-
chemistry_tools.pubchem.utils.
get_json
(identifier, namespace='cid', domain='compound', operation=None, searchtype=None, **kwargs)[source]¶ Request wrapper that automatically parses JSON response and suppresses NotFoundError.
-
chemistry_tools.pubchem.utils.
get_properties
(properties, identifier, namespace='cid', searchtype=None, as_dataframe=False, **kwargs)[source]¶ Retrieve the specified properties from PubChem.
Parameters: - identifier – The compound, substance or assay identifier to use as a search query.
- namespace – (optional) The identifier type.
- searchtype – (optional) The advanced search type, one of substructure, superstructure or similarity.
- as_dataframe – (optional) Automatically extract the properties into a pandas
DataFrame
.
-
chemistry_tools.pubchem.utils.
get_sdf
(identifier, namespace='cid', domain='compound', operation=None, searchtype=None, **kwargs)[source]¶ Request wrapper that automatically parses SDF response and suppresses NotFoundError.
-
chemistry_tools.pubchem.utils.
get_sids
(identifier, namespace='cid', domain='compound', searchtype=None, **kwargs)[source]¶
-
chemistry_tools.pubchem.utils.
get_synonyms
(identifier, namespace='cid', domain='compound', searchtype=None, **kwargs)[source]¶
-
chemistry_tools.pubchem.utils.
memoized_property
(fget)[source]¶ Decorator to create memoized properties.
Used to cache
Compound
andSubstance
properties that require an additional request.
-
chemistry_tools.pubchem.utils.
request
(identifier, namespace='cid', domain='compound', operation=None, output='JSON', searchtype=None, **kwargs)[source]¶ Construct API request from parameters and return the response.
Full specification at http://pubchem.ncbi.nlm.nih.gov/pug_rest/PUG_REST.html