Genotype calls are obtained from raw data using standard methods recommended by the platform manufacturer. When raw data are already available in the form of genotype calls, they are used directly..
When raw or genotype level data are available, each sample from study is annotated with the following information.
» Family ID
» Sample ID
» Paternal ID (for family-based studies)
» Maternal ID (for family-based studies)
» Sex
» Phenotype (discrete or quantitative)
» Group/cluster (eg, geographical region) to assess possible effects of population stratification.
Genotype calls are analyzed and after basic preprocessing. Various summary statistics are generated before QC filtering as follows: Setting invalid genotypes to “missing”—eg, female Y genotype, heterozygous haploid chromosome
» Missing genotype rate
» Missing rates by case or control status
» HWE failures
» Mendel errors (family-based data only)
Quality control (QC) filtering for individual Samples and Makers are done using summary statistics such as minor allele frequencies [MAF], Hardy-Weinberg Equilibrium [HWE] p value, call rates etc.
The following filtering criteria are applied to individual samples or markers.
To investigate the possible confounding effects of population stratification, complete-linkage agglomerative clustering, based on pairwise genome-wide identify-by-state (IBS) distance will be employed.
Plotting the various dimensions against each other can be useful for identifying any clustering of samples. A typical visualization exercise is plotting the first dimension vs. second dimension and color-coding the individuals according to the cluster information (eg, ancestry and geographical location).
» Allelic association test
» Cochran-Armitage trend test
» Dominant gene action (1 degree of freedom [df]) test
» Recessive gene action (1 df) test
» Genotypic (2 df) test.
Odds ratio with confidence intervals (default 95%) are provided.
Association analysis statistics are visualized using Manhattan plots, histograms showing the
p-value distribution, and Q-Q plots.
Family-based association testing for disease traits is conducted using the standard transmission disequilibrium test (TDT) association test.
To account for the confounding effects of population stratification in family-based association studies, p-values are corrected using Genomic Control. To adjust for multiple testing, a false-discovery rate (FDR) analysis is used. SNVs not meeting a minimum FDR (default q value = 0.2) are discarded.
Request more information, a webinar, or a visit? Get in Touch