A: If you use UKB Neale MetaXcan Data in your work please cite:
Integrating tissue specific mechanisms into GWAS summary results. Alvaro Barbeira, Scott P Dickinson, Jason M Torres, Eric S Torstenson, Jiamao Zheng, Heather E Wheeler, Kaanan P Shah, Todd Edwards, GTEx Consortium, Dan Nicolae, Nancy J Cox, Hae Kyung Im. BioRxiv. 2017. doi: https://doi.org/10.1101/045260.
A: Within each
.tar.gz you will find three kinds of files:
A: This can be accomplished by running the command
tar -xzvf filename.tar.gz. It will create a folder with the name CODE-PHENOTYPE and extract all the files in it.
A: In the home page, you can explore the AWS S3 bucket containing the S-PrediXcan files. You can insert a pattern in the “Search” box to filter for the files matching that pattern, and download the files by clicking the links individually.
For your convenience, we also provide this Google spreadsheet, which contains 1) the UK Biobank codes for the phenotypes, 2) the phenotype description, 3) file prefixes, 4) S3 links for each of the files and 5) wget commands to download them. These fields may be useful for scripting.
In case you prefer to use awscli (AWS command line interface) to download the files, we also provide examples for some common use cases (to install awscli, you can follow the instructions given here. There’s no need to set up credentials, see below NOTE). In all cases, you have to replace `destiny_folder` by the path where you want to download the data to.
To download the whole bucket (i.e. all the 2419 phenotypes): run
aws --no-sign-request --region=us-east-1 s3 cp s3://gene2pheno/ destiny_folder --recursive
To download all the files with UK Biobank code 20001 (cancer phenotypes): run
aws --no-sign-request --region=us-east-1 s3 cp s3://gene2pheno/ destiny_folder --recursive --exclude "*" --include "20001*"
NOTE: In general, it’s necessary to have AWS credentials to run
aws commands, but the
--no-sign-request option tells the CLI not to look for credentials. It works in this case because the AWS S3 bucket is public.
A: The results were generated using S-PrediXcan software.
The prediction models for gene expression that we used here are based on the GTEx v6p release of RNA-seq data (44 models), as well as the DGN study (1 model). These models were generated by our group and are publicly available as SQLite databases. To download these models, or to get more information on how they are generated, you can access the PredictDB portal. Coming soon: the v7 release of the GTEx models.
The software was run at CRI HPC at University of Chicago . The total running time was of around 12 hours (1.5 minutes for each phenotype/tissue pair, parallelized across roughly 300 nodes).
The code for the multitissue meta-analysis will be released soon, as we are still working on it.