Frequently Asked Questions

Answers to the most commonly asked questions

General

How should I organize my input data?

For the trait of interest, if an individual has only a single measurement, then organize your data into three separate files; a file with the marker data, a file with the phenotypic data, and a file with the marker map data.

If for the trait of interest multiple measurements have been made per individual (as is common in plant studies), then four separate files are needed; a file with the marker data, a file with the phenotypic data, a file with the marker map data, and a Z matrix file that contains 0's and 1's that match an individual to their measurement values.

Can Eagle deal with inbreds or can it only handle data on outbred individuals?

Eagle can handle data recorded on inbred or outbred individuals but we do assume all individuals are diploid.

What is the relationship between R and Eagle?

R is the language upon which Eagle is built. However, much of the inner workings of Eagle are written in C++ and interfaced with R via Rcpp and RcppEigen.

Do I need a marker map?

Eagle will run without a marker map being supplied. However, for interpretability of the results, it is best to supply a marker map.

Will Eagle check for errors?

Eagle will check and report on some errors, especially with the input files. However, there are errors that we cannot capture. For example, we cannot check that the columns of snp data in the marker file are in the same order as your snp in the marker map file. Also, we cannot check that the row order of individuals in the phenotype file matches the row order of individuals in the marker file. The best we can do is to check that the column numbers and row numbers are consistent across the files.

Where can I go for help?

Your best resource for help is this website. However, if you still need help, email us on eaglehelp@csiro.au

Using OpenGUI

When I click on Choose File, the file browser does not start where I want it to start

The file browser is based on functionality contained in the shinyFiles package. One of the inherent features of this shinyFiles package is that a user cannot traverse up a directory structure beyond the starting directory. We have tried to choose a starting directory that we hope will suite most users. If not, a user can always enter the full path name in the file box manually.

When I use OpenGUI, nothing happens

Make sure you are running the latest version of R along with updated versions of the packages. Use the R function update.packages(checkBuilt=TRUE, ask=FALSE) to update your packages to the latest version.

When I use OpenGUI, my web browser opens but everything is greyed out and I cannot click on any of the widgets

Go back to the R window/console from which you issued the OpenGUI command. If an error has occurred, it will be printed here. The most likely cause is that your packages have been installed under a version of R that is different to the one being used. Run the following to update your installed packages

update.packages(checkBuilt=TRUE, ask=FALSE)

Reading in Marker Genotypes

My marker data has heterozygous genotypes of the form 12 and 21. What do I do?

Unfortunately, Eagle can only handle a single value for heterozygous genotypes. This means that you will need to replace one of the heterozygous genotypes with the other in your marker data set. This might change in the future if there is a demand for this functionality.

What types of marker data can Eagle handle?

Eagle can deal with two types of marker data; genotypic and allelic. If marker genotypes are available, then they need to be in a plain space separated text file where the rows are the individuals and the columns are the snps. The file should not contain row or column headings. The genotypes can be any alphanumeric value, as long as these same alphanumeric values are used across all the loci. If allelic data are available, then this needs to be in PLINK ped form. Go to our Quick Start guide for further details.

Am I allowed to have missing marker data?

Yes. If genotype data are available, then any alphanumeric value can be used to denote missing data (as long as only a single value for missingness is used across all the loci). If allelic data are available, PLINK only allows 0 or - to be missing alleles. However, ideally, missing marker genotypes should be imputed with programs like BEAGLE or fastPHASE before analysis.

If I have missing marker genoytypes, what happens?

Eagle sets the missing marker genotypes to the heterozygous genotype AB. Since Eagle assumes an additive locus model, setting a missing genotype to AB is equivalent to imputing a genotype that has no effect on a trait. If there are a large number of missing genotypes, this will reduce the power for detecting marker-trait associations. A better strategy is to impute the missing marker genotypes prior to analysis with dedicated imputation software such as BEAGLE or fastPHASE.

Can Eagle handle huge marker data sets?

Yes. Eagle can analyse marker data larger than the memory capacity of a computer by using out-of-memory matrix calculation.

I cannot get my marker data to read in

A common source of error is an unequal number of elements across the rows of a marker file. Eagle will capture this error, reporting on which row contains the unequal number of elements.

Check that there are no spaces at the beginning of a line.

Replace any tabs with spaces.

We've also encountered problems when transferring files from a Windows system to a Unix system and vise versa. This is because the format for a Windows and a Unix text file differ slightly in how they handle the end of a line. When transferring files between different platforms, it is good practice to use a file conversion program first, such as dos2unix and unix2dos.

Can Eagle handle dominant marker loci and loci with more than two alleles?

Yes. Suppose you have dominant marker data with genotype codes 0 and 1 for absence and presence, respectively. Then, when the marker data is being read with `Read Genotypes`, set the parameter AA to 0 and BB to 1 (or vise versa) but leave AB blank. If you have a multi-allelic locus, say with 10 segregating alleles, then turn this locus into 10 dominant loci and treat these loci as described above (remembering to modify the marker map file for the extra loci accordingly).

Reading in Phenotypic Data

Can Eagle handle missing trait data and/or missing fixed effects data?

Yes. Individuals with missing trait and/or fixed effects are removed from the analysis. However, only individuals whose data are being considered for analysis will be removed. Individuals with missing data not involved in the analysis will not be removed.

Can my phenotypic file contain multiple traits?

Yes.

Can my phenotypic file contain fixed effects that I may not use in an analysis?

Yes

Do I need to be concerned with the order of the rows in my phenotypic file?

Yes.

If the trait contains n measurements on n individuals, then the ordering of the individuals in the phenotypic file must be the same as the ordering of the individuals in the marker file.

If the trait contains n measurements on m individuals, where n is greater than m, then a Z matrix (n x m) is needed to associate the n measurements to the m individuals. The (column) order of the m individuals in the Z matrix is assumed to be the same as the ordering of individuals in the marker file.

Running Eagle

How can I get the latest version of Eagle?

If Eagle has not been previously installed, then start R and at the R prompt, type

install.packages("Eagle", dependencies=TRUE)

Eagle is dependent upon several other packages. This command will install Eagle, along with any missing packages upon which Eagle is dependent.

If Eagle is already installed, but you want the latest version of Eagle and the packages upon which Eagle is dependent, then at the R prompt, type

update.packages(checkBuilt=TRUE, ask=FALSE) 

Do I need to update Eagle if I have updated my version of R?

Yes.

We have seen some strange behaviour, especially with `OpenGUI()`, when Eagle and its dependencies have been installed under different versions of R. When ever a newer version of R is installed, it is good practice to update it's packages with

update.packages(checkBuilt=TRUE, ask=FALSE)

What platforms will Eagle run on?

Eagle will run on the same platforms that R will run on which are Linux, OS X (Mac), and Windows.

Performing Genome-wide Association Mapping

Am I restricted to only analysing continuous traits or can I also analyse discrete/disease traits?

No, you are not restricted to only analysing continuous (quantitative) traits. However, there may be a loss in power for detecting marker-trait associations for a discrete trait. Eagle is based on linear mixed models. Linear mixed models assume a response (or trait) is normally distributed but they are robust to violations of their assumptions.

Am I allowed to have interactions between my fixed effects?

Yes. However, using Eagle via OpenGUI can only handle main effects. One solution is to add an extra column to the phenotypic data file that is the interaction of the two effects that are of interest. Then, include these data as an extra fixed effect in the analysis. Alternately, from the R prompt, use the `AM` function and define the fixed effect part of the model by setting fformula. For example, if you have two effects, called x1 and x2 say, then to fit a model that has x1 and x2 as main effects and their interaction, set the fformula option to fformula="x1*x2" in the AM() function.

My analysis doesn't seem to be using multiple threads/cpu

Not all parts of an Eagle analysis is parallelized. However, you should be seeing multiple threads being used at different times throughout each iteration of the model building process. It is most likely that your version of R is not making use of a multi-threaded BLAS library such as MKL or openBLAS. Look at our instruction notes on how to install R that is multi-threaded.