Call Us: +91 984 842 7636
Follow Us:

GWAS Data Analysis
The raw GWA goes through a series of systematic screening, curation, and analysis process.

Genotype calls are obtained from raw data using standard methods recommended by the platform manufacturer. When raw data are already available in the form of genotype calls, they are used directly..

Sample Annotation

When raw or genotype level data are available, each sample from study is annotated with the following information.

» Family ID
» Sample ID
» Paternal ID (for family-based studies)
» Maternal ID (for family-based studies)
» Sex
» Phenotype (discrete or quantitative)
» Group/cluster (eg, geographical region) to assess possible effects of population stratification.

Raw Data Summary Statistics

Genotype calls are analyzed and after basic preprocessing. Various summary statistics are generated before QC filtering as follows: Setting invalid genotypes to “missing”—eg, female Y genotype, heterozygous haploid chromosome

» Missing genotype rate
» Missing rates by case or control status
» HWE failures
» Mendel errors (family-based data only)

Quality control (QC) filtering for individual Samples and Makers are done using summary statistics such as minor allele frequencies [MAF], Hardy-Weinberg Equilibrium [HWE] p value, call rates etc.

The following filtering criteria are applied to individual samples or markers.

Sample QC
Individual data are removed from further analysis if:
» Missing genotype data are > 10
» Mendel errors are > 5% (family-based data only)
» Gender discrepancy exists between chromosome X data and reported sex.

Marker QC
SNVs are discarded from further analysis if:
» MAF in both Cases and Controls or overall MAF is
» HWE p-value in Controls or overall HWE is < 1 × 10-6
» Average Call Rate in Cases and Controls or overall Call Rate < 95
» Mendel errors are > 10% (family-based data only).

Stratification Analysis

To investigate the possible confounding effects of population stratification, complete-linkage agglomerative clustering, based on pairwise genome-wide identify-by-state (IBS) distance will be employed.

Multi-Dimensional Scaling

Plotting the various dimensions against each other can be useful for identifying any clustering of samples. A typical visualization exercise is plotting the first dimension vs. second dimension and color-coding the individuals according to the cluster information (eg, ancestry and geographical location).

Population-Based Association Testing
Case-control analysis
For all markers in the data set, multiple association tests can be performed:

» Allelic association test
» Cochran-Armitage trend test
» Dominant gene action (1 degree of freedom [df]) test
» Recessive gene action (1 df) test
» Genotypic (2 df) test.
Odds ratio with confidence intervals (default 95%) are provided.
Association analysis statistics are visualized using Manhattan plots, histograms showing the p-value distribution, and Q-Q plots.

Family- Association Testing

Family-based association testing for disease traits is conducted using the standard transmission disequilibrium test (TDT) association test.

To account for the confounding effects of population stratification in family-based association studies, p-values are corrected using Genomic Control. To adjust for multiple testing, a false-discovery rate (FDR) analysis is used. SNVs not meeting a minimum FDR (default q value = 0.2) are discarded.

Request more information, a webinar, or a visit? Get in Touch

About Genograce
Contact Us
Say Hey
© 2017-18 Genograce. All right reserved.