Jump to main content or area navigation.

Contact Us

Computational Toxicology Research Program

SDF Download Page


ARYEXP: European Bioinformatics Institute (EBI) ArrayExpress Repository for Gene Expression Experiments
Structure-Index Locator File

** Updated Version 2a DSSTox Structure-Index Locator File, 06 March 2009 (Source website content extracted 20Jan2009)

Quick & Easy File Downloads: FTP Download Instructions

blue bullet graphic Description
blue bullet graphic Auxiliary Data File (ARYEXP_Aux_v2a)
blue bullet graphic Source Website & Contact
blue bullet graphic Main Citation
blue bullet graphic Guidance for Use
blue bullet graphic SDF Fields
blue bullet graphic Version 2 Update **
blue bullet graphic SDF Content

blue bullet graphic SDF Download Table

blue bullet graphic Acknowledgements, DSSTox Citation & Disclaimer

New Users: For general information, see DSSTox Project Goals and About DSSTox. For additional information on DSSTox SDF (Structure Data Format) files and their use in Chemical Relational Databases, see More on SDF and More on CRDs.


Description: The European Bioinformatics Institute (EBI) ArrayExpress Repository exit EPAis a public repository for transcriptomics, gene expression data that supports use of MIAME exit EPAguidelines in accordance with the Microarray Gene Expression Data Society (MGED) exit EPArecommendations. Since the online publication of ArrayExpress in 2002, the ArrayExpress Repository had grown to more than 7,700 experiments in 2009. Public data in ArrayExpress are made available for browsing and querying on experiment properties, submitter, species, etc. Queries return summaries of experiments and complete data, or subsets can be retrieved (http://www.ebi.ac.uk/microarray-as/aer/entry)exit EPA. Recent additions to ArrayExpress include new portals for programmatic access where users can query and download data in a systematic or automatic manner from the ArrayExpress FTP siteexit EPA.

The EBI ArrayExpress Repository and the National Center for Biotechnology Information (NCBI) GEO Seriesexit EPA (see accompanying DSSTox file, GEOGSE) are the two main public respositories of gene expression data and microarray experiments associated with the scientific literature. Deposition of data into one of these two resources is now a precondition and standard requirement for journal publication of microarray studies. At the time of this writing, neither resource has standard requirements for reporting of chemical information associated with submitter-deposited experiments. As a result, until now it has been difficult to assess the chemical-related content or, more specifically, the chemical exposure-related content in these resources such that microarray experiments have been isolated from other public sources of chemically-indexed information pertaining to toxicology. This DSSTox project was undertaken to use chemical information linkages to contribute to building a public toxicogenomics capability and to encourage the application of structure-activity relationship (SAR) concepts to gene expression data where sufficient comparable experiments on chemical analogs are available.

The DSSTox ARYEXP data file is a chemical-index file of unique chemical substances pertaining to the chemical exposure-related experimental content (identified by us as Chemical_StudyType ="Treatment") within the ArrayExpress Repository as of the date of data extraction (see Note). The chemical exposure-related content of the ArrayExpress Repository was identified through a series of automated methods that filtered for characteristics such as experimental design type- compound treatment, dose response, or time course; the occurrence of keywords- compound, chemical, treatment, drug, etc., in the experimental description category; or the occurrence of specific accession numbers such as TOXM exit EPA). These automated methods, however, were insufficient and had to be supplemented by extensive manual curation and review of the chemical content extracted from ArrayExpress fields and free text description submitter entries (Williams-Devane et al. 2009). The final DSSTox ARYEXP file contains the full complement of DSSTox Standard Chemical Fields for each unique substance, as well as URL link(s) to one or more chemical-specific Experiment_Accession number data page(s) within the ArrayExpress Repository. All ArrayExpress Experiment Accession numbers pertaining to the same chemical substance (i.e., the same DSSTox_Generic_SID) are listed in the Experiment_Accession field in the same ARYEXP chemical record.

The DSSTox ARYEXP chemical index file has been incorporated into the DSSTox Structure-Browser, and deposited into PubChemexit EPA, enabling a user to locate particular chemical-associated experiments or those associated with close chemical analogs through a structure similarity search.

Return to the list aboveReturn to Top

ARYEXP Auxiliary Data File: During the course of this project, a large amount of chemical-associated information is initially curated from the full ArrayExpress Repository file that is of potential use for toxicogenomics investigations. Prior to identifying chemical exposure-related ArrayExpress content (i.e., Treatment vs. other uses, such as Reference, Vehicle, Media, etc), we create a full listing of ArrayExpress Repository chemical-experiment pairs (i.e., one record per Experiment Accession number, with some DSSTox_Generic_SID substances spanning multiple records and experiments), along with a full complement of summary experimental descriptors and indices provided by ArrayExpress. These summary experimental fields include MIAME score elements, species, array type, number of samples, etc, as well as URL linkages to raw data, etc. This content is contained in the Auxiliary Data File (ARYEXP_Aux) offered in the Download Table below in SD or table format. The file contains the full complement of DSSTox Standard Chemical Fields, as well as 44 Source-specific content fields from ArrayExpress experiment annotations (an MS Word doc file listing all fields and their definitions is also included in the Download Table below). The content of these files will be incorporated, along with the GEOGSE files, into the Chemical Effects in Biological Systems (CEBS) database exit EPA and are being provided to the EBI ArrayExpress project in the hopes of improving chemical annotation and data linkages of public gene expression resources in the future.

Return to the list aboveReturn to Top


Source Website: EBI ArrayExpress is located online at http://www.ebi.ac.uk/microarry-as/aer exit EPA.

Note: The EBI ArrayExpress Repository is regularly updated; the DSSTox ARYEXP_v2a content represents a snapshot of the chemical exposure-related content of that repository extracted on 20Jan2009 (v1a corresponded to data extraction on 20Sep2008).

 

Source Contact: Contact ArrayExpress staff at arrayexpress@ebi.ac.uk.

Return to the list aboveReturn to Top

 

Main Citations: For more information on this project and procedures used to extract data and chemically annotate gene expression experiments in the two main public repositories, ArrayExpress and GEO, see:

Williams-Devane, C.R., M.A. Wolf, and A.M. Richard (2009) DSSTox Chemical-index Files for Exposure-Related Experiments in ArrayExpress and Gene Expression Omnibus: Enabling Toxico-chemogenomics Data Linkages, Bioinformatics, 25:692-694. exit EPA

pdf document icon Download PDF

Williams-Devane, C.R., M.A. Wolf, and A.M. Richard (2009) Towards a public toxicogenomics capability for supporting predictive toxicology: Survey of current resources and chemical indexing of experiments in GEO and ArrayExpress, Toxicology Sciences, in press. exit EPA


Guidance for Use: ARYEXP represents a departure from previously published DSSTox data files, which either contain toxicology data of potential use for structure-activity relationship (SAR) modeling, or are high-interest chemical inventories for environmental toxicology from the EPA or National Toxicology Program. This is the first DSSTox file to chemically index a public repository of microarray experiments of potential use for toxicogenomics investigations. The DSSTox ARYEXP file is an inventory of unique chemical substances, with each chemical mapped to one or more experiments contained within the ArrayExpress Repository and, in each case, chemical exposure (or treatment) is deemed a primary objective of the experiment. The file was created to encourage consideration of chemical structure and chemical similarity as an organizing principle for such data, to aid in association of common gene expression patterns, and to aid in the aggregation of multiple data types for potential toxicogenomics investigation. Users should be aware that the chemically indexed experimental content of the public ArrayExpress Repository spans a large diversity of treatment conditions, species, array types, data annotation, laboratories, etc. Hence, data aggregation by chemical or chemical similarity must also consider and attempt to control for these many variables in a public repository. An auxiliary data file, ARYEXP_Aux, is offered for download that includes a larger set of chemical-experiment pairs (including all categories of chemical experiment association) for the ArrayExpress Repository and 44 additional data fields.

Return to the list aboveReturn to Top

 

AREXP SDF Fields (26 total):

DSSTox Standard Chemical Fields (20)

StudyType

Source_ChemicalName new field added Feb2009

Note_ARYEXP
Chemical_StudyType
Experiment_Accession
Experiment_URL

e.g. Acetonitrile: E-TOXM-31 exit EPA

Return to the list aboveReturn to Top


Version 2 Update: ARYEXP_v2a and ARYEXP_Aux_v2a contain updated content extracted from the ArrayExpress website exit EPA as of 20Jan2009 (v1a corresponded to data extraction on 20Sep008). Method of data extraction and file construction is documented in the Main Citations. A total of 191 new experiments were determined to be associated with a chemical substance and were included in the updated ARYEXP_Aux_v2a file. Of these, 163 were labeled by us as chemical "treatment" experiments, and these new experiments correspond to 74 new unique chemicals (3 chemicals were deleted from v1a, leaving a total of 71 new unique chemical records associated with "treatment" experiments). Hence, 163 new chemical treatment experiment links are provided in PubChem (for a total of 1999 PubChem chemical-experiment pair entries), 71 new chemicals with links to one or more experiments have been added to the ARYEXP_v2a structure-index file, and a total of 161 new URLs to experiments were added. The chemical content totals for ARYEXP_v1a and v2a are summarized in the table below.

Whereas in v1a, the field TestSubstance_ChemicalName was used to store the Source-provided chemical name obtained from the ArrayExpress experimental record (with all abbreviations and sometimes errors), in v2a (and in all DSSTox files posted after Jan09), this Source-provided chemical name has been moved to a new field Source_ChemicalName. The Standard Chemical Field, TestSubstance_ChemicalName, now carries a default, quality-reviewed chemical name used for this substance (DSSTox_Generic_SID) across all DSSTox files (this can be a common, generic or trade name). Structure_InChI and Structure_InChIKey codes have been updated to correspond to the newly published NIST recommended standard InChI options (see http://www.epa.gov/ncct/dsstox/MoreonInChI.html#InChIDSSTox).  

** Note that a misalignment of PubChem substances to URL listings in v1a caused misdirection of some substances to ArrayExpress Experiment descriptions. This problem has been corrected in v2a.

For more information and version history, and to locate specific updated chemical records, consult the ARYEXP_LogFile in the Download Table below and version update entries in the Note_ARYEXP field.

 

Return to the list aboveReturn to Top

ARYEXP SDF Content Summary - 06 March 2009

ARYEXP SDF Content
Totals_v1a Totals_v2a
# Unique Chemical Records
887
958
DSSTox Standard Chemical Fields
20
20
DSSTox Standard Toxicity Fields
1
1
ARYEXP Source Fields
4
5
Total # Fields
25
26
Total # Treatment Experiment Accession IDs*
1836
1999
Chemical Content
Counts_v1a Counts_v2a
defined organic
628
674
inorganic
60
61
organometallic
20
20
no structure
179
203
STRUCTURE_TestedForm_DefinedOrganic:
parent
544
585
complex
61
66
salt
23
23
salt complex
0
0
TestSubstance_Description:
single chemical compound
669
716
macromolecule
165
182
mixture or formulation
42
46

* Note:  Total includes replicate Experiment Accession IDs and corresponds to unique chemical-experiment pairs, which includes many cases where the same Experiment Accession ID is mapped to different unique chemicals (i.e., experiment/study tested many chemicals).

Return to the list aboveReturn to Top

File Download Notes: The following files are offered in the DownLoad table below:

Structure Data File (SDF) is the main DSSTox product, providing the complete inventory of chemical structures, DSSTox Standard Chemical Fields, and all Source-specific data fields [Note: the structure field is blank for all records containing mixtures or undefined substances];
Data Table MS Excel (MS Office 2003) file contains the full SDF data contents in spreadsheet table form, minus the chemical structure field [file created with CambridgeSoft ChemDraw Ultra plug-in to MS Excel 2004];
Structures Table (PDF) file contains a tiled format graphical view of all chemical structures contained in the SDF file, annotated with TestSubstance_CASRN and truncated TestSubstance_ChemicalName field entries for the tested form of the chemical [file created with ACD ChemFolder, ver. 11.00, ACD Labs].

You will need Adobe Acrobat Reader, available as a free download, to view the Adobe PDF files on this page. See EPA's PDF page to learn more about PDF, and for a link to the free Acrobat Reader.

Zip files may be decompressed using a utility such as JZip. Exit EPA Disclaimer

File Types   Description File Size Format

Documentation & Data Files: ARYEXP
Log File  
31 KB
pdf document icon
SDF Structure Data File   1.1 MB Included in Zip file.
• Data Table
(no structures)
  Included in Zip file.
• Structures Table   pdf document icon
Documentation & Data Files: ARYEXP_Aux
SDF Structure Data File   5.2 MB Included in Zip file.
• Data Table
(no structures)
  Included in Zip file.
• Structures Table   pdf document icon
• Field Definitions   69 KB ms word document icon
file error report graphic link to submit error report form    

 

These files constitute the main DSSTox products. DSSTox Structure Data Files and DSSTox File Names adhere to strict formatting standards and conventions. For additional information, see More on DSSTox Standard Chemical Fields, Known Problems & Fixes, Chemical Information Quality Review Procedures, and How to Use DSSTox Files.

Quick & Easy File Downloads: FTP Download

Return to the list above Return to Top

Acknowledgements: All original and updated file content was extracted from the on-line ArrayExpress resource by ClarLynda Williams, using a combination of automated and manual curation. QA review, corrections to submitter chemical information, and structure annotation were carried out by Maritja Wolf (Lockheed Martin, Contractor for EPA). We thank Jennifer Fostel (NIEHS CEBS) and Chihae Yang (Ohio State University) for their helpful comments in the review of this work. We also thank Tom Transue (Lockheed Martin, Contractor for EPA) for assistance with loading of ARYEXP into the DSSTox Structure-Browser and QA review, and Erik Griffis for assistance in reviewing v1a ArrayExpress content. Updated files were created by ClarLynda Williams and Maritja Wolf.

DSSTox Citation:
Williams-Devane, C.R., M.A. Wolf, and A.M. Richard (2009) DSSTox European Bioinformatics Institute (EBI) ArrayExpress Repository for Gene Expression Data (ARYEXP and ARYEXP_Aux): SDF Files and Documentation, Updated versions: ARYEXP_v2a_958_06Mar2009, ARYEXP_Aux_v2a_2556_06Mar2009, www.epa.gov/ncct/dsstox/sdf_aryexp.html

Disclaimer: Every effort is made to ensure that DSSTox SDF files and associated documentation are error-free, but neither the DSSTox Source collaborators nor the EPA DSSTox project team make guarantees of accuracy, nor are any of these persons to be held liable for any subsequent use of these public data. The contents of this webpage and supporting documents have been subjected to review by the EPA National Center for Computational Toxicology and approved for publication. Approval does not signify that the contents reflect the views of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use. See additional disclaimers.

EPA/600/C-06/009

Return to the list above Return to Top

Jump to main content.