Jump to main content or area navigation.

Contact Us

Computational Toxicology Research Program

SDF Download Page


DBPCAN: EPA Water Disinfection By-Products with Carcinogenicity Estimates
Database File

ogenic Potency Database Summary Tables - All Species

** Version 4b, updated 15 February 2008:
Includes 2 new summary activity fields (ActivityOutcome, ActivityScore) to coordinate with PubChem DBPCAN Assay deposits.
Includes new InChIKey Standard Chemical Field.

Quick & Easy File Downloads: FTP Download Instructions

blue bullet graphic Description
blue bullet graphic Source Website& Contact
blue bullet graphic Main Citation
blue bullet graphic Guidance for Use
blue bullet graphic Version 4 Update
blue bullet graphic SDF Fields
blue bullet graphic SDF Content Summary

blue bullet graphic SDF Download Table

blue bullet graphic Acknowledgements, DSSTox Citation & Disclaimer

New Users: For general information, see DSSTox Project Goals and About DSSTox. For additional information on DSSTox SDF (Structure Data Format) files and their use in Chemical Relational Databases, see More on SDF and More on CRDs.

Description: The DBPCAN data file, derived from data published in the Main Citation (Woo et. al, 2002), contains predicted estimates of carcinogenic potential for 209 chemicals detected in finished drinking water samples having undergone water disinfection treatment. Since little or no health effects data exist for these disinfection by-product chemicals (DBPs), the goal of this exercise was for EPA scientists to provide informed estimates of carcinogenic potential to be used as one factor in ranking and prioritizing future monitoring, testing, and research needs in the drinking water area. The Main Citation and references therein detail the process of mechanism-based structure-activity relationship (SAR) analysis that was used for estimating concern levels for carcinogenicity. The primary authors of that study are codevelopers of the OncoLogic™ - A Computer System to Evaluate the Carcinogenic Potential of Chemicals, a program that was used to provide some input into the expert judgment employed in the present study. A major feature of the expert, mechanism-based SAR approach is the identification of close structural analogs within the same or similar chemical functional classes, for which some genotoxicity or carcinogenicity data or knowledge exists. This approach has been effectively used by the EPA for many years for screening chemicals with little existing data under the Premanufacture Notification program of the Toxic Substances Control Act. The expert, mechanism-based SAR approach has been affirmed by independent evaluation, including a prospective prediction exercise of the outcome of the National Toxicology Program cancer bioassays. For further background and details of the approach, see Refs. 26-28, 32-34 in the Main Citation and the OncoLogic™ information website.

The DBPCAN database consists primarily of relatively simple aliphatic organic structures, almost half halogenated, placed into one of 18 general chemical functional classes. CAS numbers were unavailable for nearly 20% of the chemicals in this database, indicating little to no prior study. Prior to analysis, a literature search for mutagenicity and other toxicity data was performed for the 209 DBPs to provide additional input to the SAR analysis. The DBPs are categorized by a semiquantitative ranking scale of high (H), high-moderate (HM), moderate (M), low-moderate (LM), marginal (mar), and low (L) levels of concern for potential carcinogenicity. The DBPCAN database lists the 209 chemicals and structures, along with their assigned chemical class (ChemClass_DBP), carcinogenic potential level of concern (ActivityConcernLevel_Carcinogenicity), and a short narrative summarizing the SAR rationale for the carcinogenicity estimate (ActivityConcernLevel_Rationale). Rationale narrative summaries were published in the Main Citation for 4 of the 18 chemical classes comprising the database (i.e., 57 out of the 209 DBPs); Rationale narratives for the remaining 157 chemicals were provided by author communication (Y.T. Woo) for incorporation into the DBPCAN SDF. The latter narratives are less well documented and referenced than those published in the Main Citation; hence, the source of the Rationale narrative is listed as either a numbered Table from the Main Citation or as "author communication" (ActivityConcernLevel_RationaleSource). For a subset of the carcinogenicity estimates, a close structural analog is identified in the Rationale narrative. Basic identifying information on this primary structural analog is provided in the fields: Analog_ChemicalName, Analog_CASRN, Analog_SMILES.

Return to the list above Return to Top

Source Contact: Scientific questions pertaining to this database should be directed to Yin-tak Woo, Risk Assessment Division, Office of Pollution Prevention & Toxics, US EPA, Wash. DC; email: woo.yintak@epa.gov

Source Website: Information pertaining to the OncoLogic™ program can be obtained from the EPA website URL: http://www.epa.gov/oppt/sf/pubs/oncologic.htm (updated 09Aug2011)

Main Citation: Publications reporting use of DBPCAN DSSTox structure data files are asked to list the full DSSTox file name, including date stamp, and to cite as primary reference the following:

Woo, Y.T., D. Lai, J.L. McLain, M.K. Manibusan, and V. Dellarco (2002) Use of mechanism-based structure-activity relationships analysis in carcinogenic potential ranking for drinking water disinfection by-products, Environ. Health Perspect.,110 Suppl 1: 75-87.

pdf document icon Download PDF (PDF, 13 pp, 600 KB)

You will need Adobe Acrobat Reader, available as a free download, to view the Adobe PDF files on this page. See EPA's PDF page to learn more about PDF, and for a link to the free Acrobat Reader.

Guidance for Use: This database is offered publicly to convey a mechanism-based structure-activity approach for estimating carcinogenic potential, to communicate the types of informed rationale of scientific experts upon which reasonable estimates can be based, and to provide carcinogenicity estimates for a set of chemicals for which little or no carcinogenicity data currently exist. These carcinogenicity rankings, however, are estimates based on human expert judgement, not experimental data, and should be used with proper caution. Since this database was primarily abstracted from a published journal article, the Main Citation should be consulted and referenced in any use of this database. In addition, the DBPCAN Field Definitions File, offered below, provides essential documentation for the DBPCAN SDF and should accompany any download and use of the this database. The DBPCAN Log File provides SDF file summary information (field, chemical counts, etc.), a description of procedures and quality assurance checks used in SDF file creation, and a listing of missing CASRN or Structure information in the SDF file. In addition, the Log File documents modifications incorporated into version/revision updates of the DSSTox DBPCAN SDF file. To report errors in any DBPCAN documentation or data file, click on File Error Report here or below. For additional information on DSSTox SDF files and their use in Chemical Relational Databases, see More on SDF and More on CRDs.

Version 4 Update: DBPCAN_v4a has no new chemical records but has several minor QA corrections, field entry revisions, field changes, new CASRN, etc. Changes to DSSTox Standard Chemical Fields include new ID fields: DSSTox_RID, DSSTox_Generic_SID and DSSTox_FileID (replacing DSSTox_SID and DSSTox_ID_FileName - see More on Standard Chemical Fields), and entries in theTestSubstance_Description field have been simplified. Contents of TestSubstance_ChemicalName_Other, which has been retired, have been moved to Note_DBPCAN field, which has replaced ToxicityNote field. Also, abbreviations in ActivityConcernLevel_Carcinogenicity field have been replaced with full text entries.

For more information and version history, consult the DBPCAN_LogFile in the Download Table below.

Version 4b Update: DBPCAN_v4b includes one new summary activity field, ActivityOutcome_DBPCAN, for use in PubChem and structure-activity relationship studies.  In addition, the new STRUCTURE_InChIKey field (25 character abbreviated InChI for use in structure-indexing applications) has been added as a DSSTox Standard Chemical Field to all DSSTox files.

For more information and version history, consult the DBPCAN_LogFile in the Download Table below.

Return to the list above Return to Top

DBPCAN SDF Fields (31 total)*

DSSTox Standard Chemical Fields (19) * STRUCTURE_InChIKey field added in v4b

DSSTox Standard Toxicity Fields (2)

ChemClass_DBP
ActivityOutcome_DBPCAN *new to v4b
ActivityScore_DBPCAN *new to v4b
ActivityConcernLevel_Carcinogenicity
ActivityConcernLevel_Rationale
ActivityConcernLevel_RationaleSource
Analog_ChemicalName
Analog_CASRN
Analog_SMILES
Note_DBPCAN

* Note: For detailed information on SDF content, see DBPCAN_FieldDefinitionFile in Download Table below.

Return to the list above Return to Top


DBPCAN SDF Content Summary - 15 February 2008

DBPCAN SDF Content
Totals_v3b Totals_v4a Totals_v4b
# Records
209
209
209
DSSTox Standard Chemical Fields
18
18
19
DSSTox Standard Toxicity Fields
2
2
2
DBPCAN Source Fields
8
8
10
Total # Fields
28
28
31
Chemical Content
Counts_v3b Counts_v4a Counts_v4b
     
defined organic
204
204
204
inorganic
5
5
5
organometallic
0
0
0
no structure
0
0
0
STRUCTURE_TestedForm_
DefinedOrganic
:
     
parent
204
204
204
complex
0
0
0
salt
0
0
0
salt complex
0
0
0
TestSubstance_Description:
single chemical compound
208
208
207
defined mixture or formulation
1
* (NA)
* (NA)
undefined mixture
0
* (NA)
* (NA)
macromolecule
0
0
0
mixture or formulation
* (NA)
1
2

* (NA) = field entry not applicable for DSSTox file version indicated

Note: For SDF content summary information on previous versions of DBPCAN, view DBPCAN_LogFile in Download Table below.

Return to the list above Return to Top

File Download Notes: The following files are offered in the DownLoad table below:

Log File (PDF) provides SDF data file version history and summary information (field, chemical counts, etc.), and a description of procedures and quality assurance checks used in SDF file creation;
Field Definition File (PDF or MS Word doc file) provides field definitions and essential documentation, and should be downloaded with and accompany any use of the DSSTox SDF file;
Structure Data File (SDF) is the main DSSTox product, providing the complete inventory of chemical structures, DSSTox Standard Chemical Fields, and all Source-specific data fields [Note: the structure field is blank for all records containing mixtures or undefined substances];
Data Table MS Excel (MS Office 2003) file contains the full SDF data contents in spreadsheet table form, minus the chemical structure field [file created with CambridgeSoft ChemFinder plug-in to MS Excel 2003];
Structures Table (PDF) file contains a tiled format graphical view of all chemical structures contained in the SDF file, annotated with TestSubstance_CASRN and truncated TestSubstance_ChemicalName field entries for the tested form of the chemical [file created with ACD ChemFolder, ver. 10.01, ACD Labs].

You will need Adobe Acrobat Reader, available as a free download, to view the Adobe PDF files on this page. See EPA's PDF page to learn more about PDF, and for a link to the free Acrobat Reader.

File Types  
Description
File Size Format

Document Files
Log File  
41KB
pdf document icon
Field Definition File  
34KB
pdf document icon
   
91KB
ms word document icon
Data Files: DBPCAN
SDF Structure Data File  
450KB
sdf document icon
• Data Table
(no structures)
 
157KB
excel document icon
• Structures Table  
637KB
pdf document icon
file error report graphic link to submit error report form        

These files constitute the main DSSTox products. DSSTox Documentation Files use standard templates, and DSSTox Structure Data Files and DSSTox File Names adhere to strict formatting standards and conventions. For additional information, see More on DSSTox Standard Chemical Fields, Known Problems & Fixes, Chemical Information Quality Review Procedures, and How to Use DSSTox Files.

Quick & Easy File Downloads: FTP Download

Return to the list above Return to Top

Acknowledgements: The original DSSTox SDF file for the EPA DBPCAN database was created by ClarLynda Williams (EPA/NC Central Univ Student COOP; EPA), Nina Fields (Shaw Univ High School Student Mentoring Program), and Ann Richard (EPA). Assistance from the Source collaborator, Yin-tak Woo, in providing supplementary Source data and for careful review of this database and documentation is also gratefully acknowledged. We thank Doug Wolf (EPA) and Susan Richardson (EPA) for reviewing the documentation materials for v1a and Leonard Mole (EPA) for data file review and error reports. All subsequent QA review and structure modifications to DBPCAN_v2 and later versions were carried out by Maritja Wolf (Lockheed Martin, Contractor for EPA).

DSSTox Citation: Woo, Y.T., C.R. Williams, N. Fields, and A.M. Richard (2008) DSSTox EPA Water Disinfection By-Products with Carcinogenicity Estimates Database (DBPCAN): SDF Files and Documentation, Updated version: DBPCAN_v4b_209_15Feb2008, www.epa.gov/ncct/dsstox/sdf_dbpcan.html

Disclaimer: Every effort is made to ensure that DSSTox SDF files and associated documentation are error-free, but neither the DSSTox Source collaborators nor the EPA DSSTox project team make guarantees of accuracy, nor are any of these persons to be held liable for any subsequent use of these public data. The contents of this webpage and supporting documents have been subjected to review by the National Health and Environmental Effects Research Laboratory and approved for publication. Approval does not signify that the contents reflect the views of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use. See additional disclaimers.

Return to the list above Return to Top

Jump to main content.