chemistry_tools.pubchem.compound

Attention

This module has the following additional requirements:

cawdrey>=0.1.7
mathematical>=0.1.13
pillow>=7.0.0; platform_python_implementation == "PyPy" and python_version != "3.6"
pillow>=7.0.0; platform_python_implementation != "PyPy"
pillow<=8.0.0,>=7.0.0; platform_python_implementation == "PyPy" and python_version == "3.6"
pyparsing>=2.4.6
tabulate>=0.8.9

These can be installed as follows:

python -m pip install chemistry-tools[pubchem]

Represents a chemical compound.

Data:

C

Invariant TypeVar bound to chemistry_tools.pubchem.compound.Compound.

Classes:

Compound(title, CID, description, **_)

Corresponds to a single record from the PubChem Compound database.

Functions:

compounds_to_frame(compounds)

Construct a pandas.DataFrame from a list of Compound objects.

C = TypeVar(C, bound=Compound)

Type:    TypeVar

Invariant TypeVar bound to chemistry_tools.pubchem.compound.Compound.

class Compound(title, CID, description, **_)[source]

Bases: Dictable

Corresponds to a single record from the PubChem Compound database.

The PubChem Compound database is constructed from the Substance database using a standardization and deduplication process. Each Compound is uniquely identified by a CID.

Parameters
  • title (str) – The title of the compound record (usually the name of the compound)

  • CID (int)

  • description

Methods:

__repr__()

Return a string representation of the Compound.

from_cid(cid[, record_type])

Returns the Compound objects for the compound with the given CID.

get_iupac_name([type_])

Return the IUPAC name of this compound.

get_properties(properties)

Returns the requested properties for the Compound.

get_property(prop)

Get a single property for the compound.

precache()

Precache all properties for this compound.

to_series()

Return a pandas Series containing Compound data.

Attributes:

atoms

List of Atoms in this Compound.

bonds

List of Bonds between Atoms in this Compound.

cactvs_fingerprint

PubChem CACTVS fingerprint.

canonical_smiles

Canonical SMILES, with no stereochemistry information.

canonicalized

Whether the compound is canonicalized.

charge

The charge of the compound.

cid

Returns the ID of this compound.

coordinate_type

The coordinate type of this compound.

elements

List of element symbols for atoms in this Compound.

fingerprint

Raw padded and hex-encoded fingerprint, as returned by the PUG REST API.

has_full_record

Returns whether this compound has a full record available.

iupac_name

The preferred IUPAC name of this compound.

molecular_formula

Molecular formula.

molecular_mass

Molecular Weight.

molecular_weight

Molecular Weight.

smiles

Canonical SMILES, with no stereochemistry information.

synonyms

Returns a list of synonyms for the Compound.

systematic_name

The systematic IUPAC name of this compound.

__repr__()[source]

Return a string representation of the Compound.

Return type

str

property atoms

List of Atoms in this Compound.

Return type

List[Atom]

property bonds

List of Bonds between Atoms in this Compound.

Return type

List[Bond]

property cactvs_fingerprint

PubChem CACTVS fingerprint.

Each bit in the fingerprint represents the presence or absence of one of 881 chemical substructures.

Return type

Optional[str]

property canonical_smiles

Canonical SMILES, with no stereochemistry information.

Return type

str

property canonicalized

Whether the compound is canonicalized.

Return type

bool

property charge

The charge of the compound.

Return type

int

property cid

Returns the ID of this compound.

Return type

int

property coordinate_type

The coordinate type of this compound.

Return type

Optional[str]

property elements

List of element symbols for atoms in this Compound.

Return type

List[str]

property fingerprint

Raw padded and hex-encoded fingerprint, as returned by the PUG REST API.

Return type

Optional[str]

classmethod from_cid(cid, record_type='2d')[source]

Returns the Compound objects for the compound with the given CID.

Return type

Compound

get_iupac_name(type_='Systematic')[source]

Return the IUPAC name of this compound.

Parameters

type_ (str) – The type of IUPAC name. Default 'Systematic'.

Return type

Optional[str]

get_properties(properties)[source]

Returns the requested properties for the Compound.

Parameters

properties (Union[Sequence[str], str]) – The properties to retrieve for the compound. See the table below. Can be either a comma-separated string or a list.

Property

Description

MolecularFormula

Molecular formula.

MolecularWeight

The molecular weight is the sum of all atomic weights of the constituent atoms in a compound, measured in g/mol. In the absence of explicit isotope labelling, averaged natural abundance is assumed. If an atom bears an explicit isotope label, 100% isotopic purity is assumed at this location.

CanonicalSMILES

Canonical SMILES (Simplified Molecular Input Line Entry System) string. It is a unique SMILES string of a compound, generated by a “canonicalization” algorithm.

IsomericSMILES

Isomeric SMILES string. It is a SMILES string with stereochemical and isotopic specifications.

InChI

Standard IUPAC International Chemical Identifier (InChI). It does not allow for user selectable options in dealing with the stereochemistry and tautomer layers of the InChI string.

InChIKey

Hashed version of the full standard InChI, consisting of 27 characters.

IUPACName

Chemical name systematically determined according to the IUPAC nomenclatures.

XLogP

Computationally generated octanol-water partition coefficient or distribution coefficient. XLogP is used as a measure of hydrophilicity or hydrophobicity of a molecule.

ExactMass

The mass of the most likely isotopic composition for a single molecule, corresponding to the most intense ion/molecule peak in a mass spectrum.

MonoisotopicMass

The mass of a molecule, calculated using the mass of the most abundant isotope of each element.

TPSA

Topological polar surface area, computed by the algorithm described in the paper by Ertl et al.

Complexity

The molecular complexity rating of a compound, computed using the Bertz/Hendrickson/Ihlenfeldt formula.

Charge

The total (or net) charge of a molecule.

HBondDonorCount

Number of hydrogen-bond donors in the structure.

HBondAcceptorCount

Number of hydrogen-bond acceptors in the structure.

RotatableBondCount

Number of rotatable bonds.

HeavyAtomCount

Number of non-hydrogen atoms.

IsotopeAtomCount

Number of atoms with enriched isotope(s)

AtomStereoCount

Total number of atoms with tetrahedral (sp3) stereo [e.g., (R)- or (S)-configuration]

DefinedAtomStereoCount

Number of atoms with defined tetrahedral (sp3) stereo.

UndefinedAtomStereoCount

Number of atoms with undefined tetrahedral (sp3) stereo.

BondStereoCount

Total number of bonds with planar (sp2) stereo [e.g., (E)- or (Z)-configuration].

DefinedBondStereoCount

Number of atoms with defined planar (sp2) stereo.

UndefinedBondStereoCount

Number of atoms with undefined planar (sp2) stereo.

CovalentUnitCount

Number of covalently bound units.

Volume3D

Analytic volume of the first diverse conformer (default conformer) for a compound.

XStericQuadrupole3D

The x component of the quadrupole moment (Qx) of the first diverse conformer (default conformer) for a compound.

YStericQuadrupole3D

The y component of the quadrupole moment (Qy) of the first diverse conformer (default conformer) for a compound.

ZStericQuadrupole3D

The z component of the quadrupole moment (Qz) of the first diverse conformer (default conformer) for a compound.

FeatureCount3D

Total number of 3D features (the sum of FeatureAcceptorCount3D, FeatureDonorCount3D, FeatureAnionCount3D, FeatureCationCount3D, FeatureRingCount3D and FeatureHydrophobeCount3D)

FeatureAcceptorCount3D

Number of hydrogen-bond acceptors of a conformer.

FeatureDonorCount3D

Number of hydrogen-bond donors of a conformer.

FeatureAnionCount3D

Number of anionic centers (at pH 7) of a conformer.

FeatureCationCount3D

Number of cationic centers (at pH 7) of a conformer.

FeatureRingCount3D

Number of rings of a conformer.

FeatureHydrophobeCount3D

Number of hydrophobes of a conformer.

ConformerModelRMSD3D

Conformer sampling RMSD in Å.

EffectiveRotorCount3D

Total number of 3D features (the sum of FeatureAcceptorCount3D, FeatureDonorCount3D, FeatureAnionCount3D, FeatureCationCount3D, FeatureRingCount3D and FeatureHydrophobeCount3D)

ConformerCount3D

The number of conformers in the conformer model for a compound.

Fingerprint2D

Base64-encoded PubChem Substructure Fingerprint of a molecule.

Return type

Dict[str, Any]

Returns

Dictionary mapping the property names to their values

get_property(prop)[source]

Get a single property for the compound.

Parameters

prop (str) – The property to retrieve for the compound. See the table below.

Property

Description

MolecularFormula

Molecular formula.

MolecularWeight

The molecular weight is the sum of all atomic weights of the constituent atoms in a compound, measured in g/mol. In the absence of explicit isotope labelling, averaged natural abundance is assumed. If an atom bears an explicit isotope label, 100% isotopic purity is assumed at this location.

CanonicalSMILES

Canonical SMILES (Simplified Molecular Input Line Entry System) string. It is a unique SMILES string of a compound, generated by a “canonicalization” algorithm.

IsomericSMILES

Isomeric SMILES string. It is a SMILES string with stereochemical and isotopic specifications.

InChI

Standard IUPAC International Chemical Identifier (InChI). It does not allow for user selectable options in dealing with the stereochemistry and tautomer layers of the InChI string.

InChIKey

Hashed version of the full standard InChI, consisting of 27 characters.

IUPACName

Chemical name systematically determined according to the IUPAC nomenclatures.

XLogP

Computationally generated octanol-water partition coefficient or distribution coefficient. XLogP is used as a measure of hydrophilicity or hydrophobicity of a molecule.

ExactMass

The mass of the most likely isotopic composition for a single molecule, corresponding to the most intense ion/molecule peak in a mass spectrum.

MonoisotopicMass

The mass of a molecule, calculated using the mass of the most abundant isotope of each element.

TPSA

Topological polar surface area, computed by the algorithm described in the paper by Ertl et al.

Complexity

The molecular complexity rating of a compound, computed using the Bertz/Hendrickson/Ihlenfeldt formula.

Charge

The total (or net) charge of a molecule.

HBondDonorCount

Number of hydrogen-bond donors in the structure.

HBondAcceptorCount

Number of hydrogen-bond acceptors in the structure.

RotatableBondCount

Number of rotatable bonds.

HeavyAtomCount

Number of non-hydrogen atoms.

IsotopeAtomCount

Number of atoms with enriched isotope(s)

AtomStereoCount

Total number of atoms with tetrahedral (sp3) stereo [e.g., (R)- or (S)-configuration]

DefinedAtomStereoCount

Number of atoms with defined tetrahedral (sp3) stereo.

UndefinedAtomStereoCount

Number of atoms with undefined tetrahedral (sp3) stereo.

BondStereoCount

Total number of bonds with planar (sp2) stereo [e.g., (E)- or (Z)-configuration].

DefinedBondStereoCount

Number of atoms with defined planar (sp2) stereo.

UndefinedBondStereoCount

Number of atoms with undefined planar (sp2) stereo.

CovalentUnitCount

Number of covalently bound units.

Volume3D

Analytic volume of the first diverse conformer (default conformer) for a compound.

XStericQuadrupole3D

The x component of the quadrupole moment (Qx) of the first diverse conformer (default conformer) for a compound.

YStericQuadrupole3D

The y component of the quadrupole moment (Qy) of the first diverse conformer (default conformer) for a compound.

ZStericQuadrupole3D

The z component of the quadrupole moment (Qz) of the first diverse conformer (default conformer) for a compound.

FeatureCount3D

Total number of 3D features (the sum of FeatureAcceptorCount3D, FeatureDonorCount3D, FeatureAnionCount3D, FeatureCationCount3D, FeatureRingCount3D and FeatureHydrophobeCount3D)

FeatureAcceptorCount3D

Number of hydrogen-bond acceptors of a conformer.

FeatureDonorCount3D

Number of hydrogen-bond donors of a conformer.

FeatureAnionCount3D

Number of anionic centers (at pH 7) of a conformer.

FeatureCationCount3D

Number of cationic centers (at pH 7) of a conformer.

FeatureRingCount3D

Number of rings of a conformer.

FeatureHydrophobeCount3D

Number of hydrophobes of a conformer.

ConformerModelRMSD3D

Conformer sampling RMSD in Å.

EffectiveRotorCount3D

Total number of 3D features (the sum of FeatureAcceptorCount3D, FeatureDonorCount3D, FeatureAnionCount3D, FeatureCationCount3D, FeatureRingCount3D and FeatureHydrophobeCount3D)

ConformerCount3D

The number of conformers in the conformer model for a compound.

Fingerprint2D

Base64-encoded PubChem Substructure Fingerprint of a molecule.

Return type

Any

property has_full_record

Returns whether this compound has a full record available.

Return type

bool

property iupac_name

The preferred IUPAC name of this compound.

Return type

Optional[str]

property molecular_formula

Molecular formula.

Return type

Formula

property molecular_mass

Molecular Weight.

Return type

float

property molecular_weight

Molecular Weight.

Return type

float

precache()[source]

Precache all properties for this compound.

property smiles

Canonical SMILES, with no stereochemistry information.

Return type

str

property synonyms

Returns a list of synonyms for the Compound.

Return type

Optional[List[str]]

property systematic_name

The systematic IUPAC name of this compound.

Return type

Optional[str]

to_series()[source]

Return a pandas Series containing Compound data.

Return type

Series

compounds_to_frame(compounds)[source]

Construct a pandas.DataFrame from a list of Compound objects.

Parameters

compounds (Union[Compound, List[Compound]])

Return type

DataFrame