chemistry_tools.names
Functions for working with IUPAC names for chemicals.
Functions:
|
Returns the corresponding CAS registry number for the given IUPAC name. |
|
Splits an IUPAC name for a compound into its constituent parts. |
|
Returns the order the given IUPAC names should be sorted in. |
|
Returns the constituent parts of the IUPAC names sorted into order. |
|
Returns the corresponding IUPAC name for the given CAS registry number. |
|
Sort a list of IUPAC names into order. |
|
Sort a list of lists by the IUPAC name in each row. |
|
Sorts a |
Data:
Regular expression to match “multiple” prefixes such as mono-. |
|
List of regular expressions to decompose an IUPAC name. |
-
cas_from_iupac_name
(iupac_name)[source] Returns the corresponding CAS registry number for the given IUPAC name.
-
get_IUPAC_sort_order
(iupac_names)[source] Returns the order the given IUPAC names should be sorted in.
Useful when sorting arrays containing data in addition to the name. e.g.
>>> sort_order = get_IUPAC_sort_order([row[0] for row in data]) >>> sorted_data = sorted(data, key=lambda row: sort_order[row[0]])
where row[0] would be the name of the compound
-
get_sorted_parts
(iupac_names)[source] Returns the constituent parts of the IUPAC names sorted into order.
The parts returned are in reverse order (i.e.
'diphenylamine'
becomes['amine', 'phenyl', 'di']
).
-
iupac_name_from_cas
(cas_number)[source] Returns the corresponding IUPAC name for the given CAS registry number.
-
multiplier_regex
Type:
Pattern
Regular expression to match “multiple” prefixes such as mono-.
Pattern
(mono)*(di)*(tri)*(tetra)*(penta)*(hexa)*(hepta)*(octa)*(nona)*(deca)*(undeca)*(dodeca)*(trideca)*(tetradeca)*(pentadeca)*(hexadeca)*(heptadeca)*(octadeca)*(nonadeca)*(icosa)*(henicosa)*(docosa)*(tricosa)*(triaconta)*(hentriaconta)*(dotriaconta)*(tetraconta)*(pentaconta)*(hexaconta)*(heptaconta)*(octaconta)*(nonaconta)*(hecta)*(dicta)*(tricta)*(tetracta)*(pentacta)*(hexacta)*(heptacta)*(octacta)*(nonacta)*(kilia)*(dilia)*(trilia)*(tetralia)*(pentalia)*(hexalia)*(heptalia)*(octalia)*(nonalia)*
-
re_strings
= [re.compile('((\\d+),?)+(\\d+)-'), re.compile('(mono)*(di)*(tri)*(tetra)*(penta)*(hexa)*(hepta)*(octa)*(nona)*(deca)*(undeca)*(dodeca)*(trideca)*(tetradeca)*(pentadeca)*(hexadeca)*(heptadeca)*(octadeca)*(nonadeca)*(icosa)*(henicosa)*(docosa)*(tri), re.compile('nitro'), re.compile('phenyl'), re.compile('aniline'), re.compile('anisole'), re.compile('benzene'), re.compile('centralite'), re.compile('formamide'), re.compile('glycerine'), re.compile('nitrate'), re.compile('glycol'), re.compile('phthalate'), re.compile('picrate'), re.compile('toluene'), re.compile('methyl'), re.compile('(?<!m)ethyl'), re.compile('propyl'), re.compile('butyl'), re.compile(' '), re.compile('\\('), re.compile('\\)'), re.compile('hydroxyl'), re.compile('amin[oe]'), re.compile('amide')] -
List of regular expressions to decompose an IUPAC name.
-
sort_array_by_name
(array, name_col=0, reverse=False)[source] Sort a list of lists by the IUPAC name in each row.
-
sort_dataframe_by_name
(df, name_col, reverse=False)[source] Sorts a
pandas.DataFrame
by the IUPAC name in each row.