Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering.

PISSN

0002-9297

Publication Dbxref

PMID:17924348

Title

Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering.

Publication Type

Journal Article

Additional Publication Type(s)

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

Series Name

American journal of human genetics

Volume

81

Publication Year

2007

Issue

5

Page Numbers

1084-97

DOI

10.1086/521987

Journal Abbreviation

Am J Hum Genet

Publication Date

2007 Nov

Unique Local Identifier

Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering.. American journal of human genetics. 2007 Nov; 81(5):1084-97.

Citation

Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering.. American journal of human genetics. 2007 Nov; 81(5):1084-97.

ISSN

0002-9297

Language Abbr

eng

Publication Model

Print-Electronic

Authors

Browning SR, Browning BL

Language

English

URL

https://doi.org/10.1086/521987

Journal Country

United States

Abstract

Whole-genome association studies present many new statistical and computational challenges due to the large quantity of data obtained. One of these challenges is haplotype inference; methods for haplotype inference designed for small data sets from candidate-gene studies do not scale well to the large number of individuals genotyped in whole-genome association studies. We present a new method and software for inference of haplotype phase and missing data that can accurately phase data from whole-genome association studies, and we present the first comparison of haplotype-inference methods for real and simulated data sets with thousands of genotyped individuals. We find that our method outperforms existing methods in terms of both speed and accuracy for large data sets with thousands of individuals and densely spaced genetic markers, and we use our method to phase a real data set of 3,002 individuals genotyped for 490,032 markers in 3.1 days of computing time, with 99% of masked alleles imputed correctly. Our method is implemented in the Beagle software package, which is freely available.

Database Reference Annotations

PMID:17924348

Is Obsolete

False