Computational Toxicology Research Program

Searching DSSTox Files with PubChem

Note: All current DSSTox substances and Data Files have been deposited in PubChem, including one file ( CPDBAS) deposited as multiple PubChem Assays. The contents of this page will be updated soon with instructions for searching PubChem and extracting PubChem bioassay data for DSSTox compound lists. Also, see the DSSTox Structure Browser Information Page (features new to v2.0) for information on the new PubChem External link.

PubChem exit EPA is a very large and growing (over 10 million unique chemical data entries) public, on-line chemical database resource that invites chemical structure-annotated data submissions, preferrably of bioassay summary data. PubChem was developed under the NLM National Center for Biotechnology Information (NLM/NCBI) exit EPA to serve as the primary repository and public delivery mechanism for chemically-indexed High Throughput Screening (HTS) bioassay data to be generated under the NIH Molecular Libraries & Imaging Roadmap (MLR) Initiative exit EPA.

Users are referred to the NCBI PubChem website exit EPA for details on the database content, instructions for data submission, and mulitple search options (see also More Information on PubChem for links to PubChem Help and Instruction pages). For more information on DSSTox collaborations pertaining to PubChem, see Coordinating Public Efforts.

More specific information pertaining to locating and searching DSSTox data records contained in PubChem is provided on this page for the topic areas listed below.

Current postings of DSSTox data files in PubChem

Shortly after publication of new or updated DSSTox Structure Data Files on this website, SDF files and associated documentation will be deposited into PubChem exit EPA. Existing DSSTox_Generic_SID records will be updated with new information and new DSSTox_Generic_SID records not occurring in previously published DSSTox SDF files will be assigned new unique PubChem SIDs. All DSSTox SDF files will be fully structure-searchable in PubChem.

DSSTox SDF files for 5 published databases were originally uploaded into PubChem in Nov 2005. See PubChem modifications for a list of changes to these files required for posting. For various PubChem search options, see other topic areas on this page. These earlier versions of current published files were provided to the PubChem project after major quality review and revision of DSSTox Standard Chemical Fields. PubChem staff provided invaluable assistance with the inital deposit of DSSTox SDF files. The most current published DSSTox Structure Data Files (to be deposited in PubChem) have undergone additional, continuing QA review and a large number of minor corrections/modifications are included.

PubChem collaboration with DSSTox

In our initial collaboration, five published DSSTox SDF files were provided to PubChem staff (special thanks to Jane Tseng), who processed, transformed and posted these files on PubChem in structure-searchable form (see PubChem Announcements, Nov 22, 2005exit EPA). See also PubChem modifications of DSSTox file content.

DSSTox Standard Chemical Fields were revised and expanded to include the DSSTox_Generic_SID field (i.e., unique chemical substance IDs) and DSSTox_CID field (i.e., unique chemical structure IDs) in August 2005, in part to create compatibility with the PubChem data management model and to ensure consistent representation of chemical substances and structures across DSSTox files. Extensive quality review of DSSTox files accompanied these changes to provide more consistent structure and test substance representation across present and future published DSSTox files. See also Chemical Information Quality Review Procedures. See also Coordinating Public Efforts.

PubChem Contacts:
Stephen Bryant, email: bryant@ncbi.nlm.nih.gov
Jane Tseng, email: jatseng@ncbi.nlm.nih.gov
Yanli Wang, email: ywang@ncbi.nlm.nih.gov

PubChem modifications of DSSTox file content

Some abbreviation and modification of DSSTox file content is required to conform to PubChem chemical management and summary bioassay display options. These changes include:

Assignment of PubChem Substance IDs (SID) having a 1:1 correspondence to DSSTox_Generic_SID but not identical to the latter. DSSTox_Generic_SID are used for DSSTox database construction and internal data management, whereas PubChem SIDs are assigned after the DSSTox database is deposited into PubChem.

Correspondence of DSSTox SDF structures to closest match of PubChem CID structure (from NLM ChemID Plusexit EPA) for internal referencing and display; not necessarily identical to DSSTox SDF "mol file" structure (e.g., DSSTox structures may be more stereochemically specific than provided in the ChemID Plus library).

PubChem abstract of DSSTox SDF Download Page and corresponding DSSTox Field Definition content for inclusion in PubChem "Bioassay" summary description (see sample Bioassay search below for EPAFHM).

Note: PubChem provides structure-searching access to DSSTox data files in the larger context of the entire PubChem bioassay data repository. However, due to slightly modified content and limited documentation provided in PubChem versions of DSSTox data files, we recommend that users wishing to download the most current DSSTox SDF files, in their entirety, do so from this website.

Access DSSTox chemical substance records in PubChem

A full list of DSSTox records can be obtained from the main PubChem exit EPA search page:

PubChem Text Search option: "PubChem Substance"

Text entry: "EPA DSSTox"[sourcename]

Search Returns:

... cont.

Listing the current 4258 DSSTox substance records in PubChem as of November 2006.

PubChem Bioassay Search

A summary page of DSSTox data files distinguished by "Bioassay" can be obtained from the main PubChem exit EPA page:

PubChem Text Search option: "PubChem Bioassay "

Text entry: "EPA DSSTox"[sourcename] or "dsstox"

Search Returns:

Clicking on BioAssay ID (AID): 356
yields description of DSSTox database and bioassay activity measures abstracted from DSSTox SDF Download Page (e.g., EPAFHM):


... cont.

Locating DSSTox records by PubChem structure-search

On the main PubChem exit EPA page, choose the option "Structure Search" from the bue option box.

Structure Search: exit EPA Search PubChem's Compound database using a chemical structure as the query. Structures may be sketched or specified by SMILES, MOL files, or other formats.


Sample Chemical Search Result:

Select "Bioactivity: 2 Links" to give Bioassay Summary result:

If the compound matches a bioassay result in a DSSTox database within PubChem, the result will be listed in this table, indexed by the DSSTox database name, in this case, CPDBAS.

Select "Assay ID (AID)" 352 to display Bioassay Summary Page (see, e.g., Bioassay Search).

Select "View" Data to display bioassay data for Structure:

Select "Links" view to access chemical-specific data page for Acetaldehyde methylformylhydrazone exit EPA on CPDB Source Websiteexit EPA.

Select "Name:" Carcinogenic Potency Database (CPDBAS) to link to DSSTox SDF Download Page for CPDBAS.

More Information on PubChem exit EPA

blue bullet graphic PubChem On-line Help and Instructions

blue bullet graphic "Reactive Reports" Interview with Steve Bryant, Director of PubChem Project

blue bullet graphic Wikipedia's entry on Pubchem

blue bullet graphic ChemBioGrid -Chemical Informatics & Cyberinfrastructure collaboratory: Chemistry Resources on the Web

