A Journal Devoted To All Areas Of Applied Statistics
 
 
  Annals of Applied Statistics
  Submissions
  Subscriptions
  Editorial Board
  Next Issues
  Published Issues
Supplements
  Instructions for Referees
  Letters to Editor
 
Replication data for: Random-set Methods Identify Distinct Aspects of the Enrichment Signal in Gene-set Analysis
Cataloging Information
Documentation, Data and Analysis
User Comments
 
Citation Information
How to Cite
Michael A. Newton; Fernando A. Quintana; ohan A. den Boon; Srikumar Sengupta; and Paul Ahlquist, 2007, "Replication data for: Random-set Methods Identify Distinct Aspects of the Enrichment Signal in Gene-set Analysis", hdl:1902.1/10649 Institute for Mathematical Statistics [Distributor]
Study Global Idhdl:1902.1/10649
AuthorsMichael A. Newton (University of Wisconsin–Madison); Fernando A. Quintana (Pontificia Universidad Católica de Chile); ohan A. den Boon (University of Wisconsin–Madison); Srikumar Sengupta (WiCell Research Institute); and Paul Ahlquist (University of Wisconsin–Madison and Howard Hughes Medical Institute)
Production Date2007
DistributorInstitute for Mathematical Statistics Logo
Distribution Date2007
Deposit DateOctober 01, 2007
Replication ForMichael A. Newton, Fernando A. Quintana, Johan A. den Boon, Srikumar Sengupta, and Paul Ahlquist. 2007. "Random-set Methods Identify Distinct Aspects of the Enrichment Signal in Gene-set Analysis." Ann. Appl. Statist. Volume 1, Number 1 (2007), 85-106. article available here
Provenance
Abstract and Scope
Abstract

A prespecified set of genes may be enriched, to varying degrees, for genes that have altered expression levels relative to two or more states of a cell. Knowing the enrichment of gene sets defined by functional categories, such as gene ontology (GO) annotations, is valuable for analyzing the biological signals in microarray expression data. A common approach to measuring enrichment is by cross-classifying genes according to membership in a functional category and membership on a selected list of significantly altered genes. A small Fisher’s exact test p-value, for example, in this 2×2 table is indicative of enrichment. Other category analysis methods retain the quantitative gene-level scores and measure significance by referring a category-level statistic to a permutation distribution associated with the original differential expression problem. We describe a class of random-set scoring methods that measure distinct components of the enrichment signal. The class includes Fisher’s test based on selected genes and also tests that average gene-level evidence across the category. Averaging and selection methods are compared empirically using Affymetrix data on expression in nasopharyngeal cancer tissue, and theoretically using a location model of differential expression. We find that each method has a domain of superiority in the state space of enrichment problems, and that both methods have benefits in practice. Our analysis also addresses two problems related to multiple-category inference, namely, that equally enriched categories are not detected with equal probability if they are of different sizes, and also that there is dependence among category statistics owing to shared genes. Random-set enrichment calculations do not require Monte Carlo for implementation. They are made available in the R package allez.

KeywordsConditional testing; gene ontology; gene set enrichment analysis; host-virus association in nasopharyngeal carcinoma; selection versus average evidence; significance analysis of function and expression
Terms of Use
Network Terms of UseIQSS Dataverse Network Terms and Conditions

By downloading these Materials, I agree to the following:

  1. I will not use the Materials to
    1. obtain information that could directly or indirectly identify subjects.
    2. produce links among the Distributor's datasets or among the Distributor's data and other datasets that could identify individuals or organizations.
    3. obtain information about, or further contact with, subjects known to me except where the use and/or release of such identifying information has no potential for constituting an unwarranted invasion of privacy and/or breach of confidentiality.
  2. I agree not to download any Materials where prohibited by applicable law.
  3. I agree not to use the Materials in any way prohibited by applicable law.
  4. I agree that any books, articles, conference papers, theses, dissertations, reports, or other publications that I create which employ data reference the bibliographic citation accompanying this data. These citations include the data authors, data identifier, and other information accord with the Recommended Standard (http://thedata.org/citation/standard) for social science data.
  5. THE DISTRIBUTOR MAKES NO WARRANTIES, EXPRESS OR IMPLIED, BY OPERATION OF LAW OR OTHERWISE, REGARDING OR RELATING TO THE DATASET

BY CLICKING THE "I AGREE" CHECKBOX BELOW, I CONFIRM THAT I HAVE READ AND UNDERSTOOD EACH AND EVERY TERM SET FORTH IN THE TERMS AND CONDITIONS FOR THE USE OF DATA FOUND ABOVE, AND I AGREE TO BE BOUND BY ALL OF SUCH TERMS AND CONDITIONS.

IF I DO NOT UNDERSTAND OR AGREE TO ALL OF THE TERMS AND CONDITIONS, I MUST NOT DOWNLOAD THE MATERIALS.

Other Information