Complex Traits Intro • Many traits (over 6000 in humans) are inherited in a simple Mendelian fashion: autosomal or sex-linked, dominant or recessive. These patterns are fairly easy to see when pedigrees are examined. • However, there are also a large number of traits that don’t easily fit the Mendelian patterns. We can call them ―multifactorial‖. – strong environmental effects affect expression – oligogenic: several genes interact to give the mutant phenotype – polygenic: many genes, each with a small effect • Some are for important physical diseases: diabetes, multiple sclerosis, etc. Other complex traits are more behavioral: schizophrenia for example. • Some multifactorial traits are quantitative in nature: height, for example. • Others have only two basic states, normal and diseased, but there is an underlying distribution of contributing factors: a threshold trait. More Intro • Locating genes responsible for these traits has been much less successful than mapping Mendelian traits. – sometimes it is hard to get good diagnostic criteria for a diseases, especially for psychological conditions. The Diagnostic and Statistical Manual of Mental Disorders, version 4, issued in 1993 (DSM IV) is the standard list of diagnostic criteria, but applying it to real cases is still a matter of subjective judgment. – What we perceive as a single disease may have multiple genetic causes – Also: multiple alleles may cause identification problems. • General approach: 1. determine whether there is good evidence for a significant genetic component to the trait, using recurrence risk, twin studies, and adoption studies. 2. attempt to find general features of the mode f inheritance and degree of environmental effect by segregation analysis. 3. attempt to locate chromosome segments of interest by non-parametric linkage analysis methods such as affected sib pairs and linkage disequilibrium. 4. Identify and clone the responsible genes. Heritability • Complex traits usually have an environmental component as well as a genetic component to their causes. – Phenotype is due to a combination of genetics and environment. • To separate these components, we examine the variation in a trait, as measured by the variance – total variance = sum of variance due to genetics + variance due to environment + covariance – VT = VG + VE + 2Cov (G,E) • in a family, genetic variance is less because genes are shared. MZ twins have VG = 0. So, studies seeking to isolate genetic influences use families to isolate VG vs. VE. • H = VG / VT heritability of a trait. Unfortunately it doesn't account for dominance or for co-variance. i.e. intelligent parents provide an enriched environment for their children, which boosts IQ score • VG = VA + VD Dominance variance due to interactions between alleles, not passed to any offspring. Additive variance is individual effect of alleles, which is passed to offspring. – h = VA / VT Narrow sense heritability. Used in breeding, smaller than H, more realistic, but not used much in humans Runs in the Family • The first question of importance: Recurrence Risk for Multiple Sclerosis how do we know a complex trait has a genetic component? • Recurrence risk: the chance of a person with an affected sibling (or other relative) getting the disease as compared to the chance for a random person getting the disease. Symbolized by λ. λS for siblings; λR for relatives in general. • Recurrence risk depends on degree of relationship: – 1st degree: DZ twins, siblings, parent/child. Share 50% of genes – 2nd degree: uncle/nephew, grandparent/grandchild: share 25% of genes – 3rd degree: first cousins: share 12.5% of genes • Recurrence risks are determined strictly from observed data: they do not involve any knowledge of the underling genetics at all. Sometimes called ―empiric risk‖ for this reason. Twin Studies • As pointed out by Darwin’s cousin Francis Galton, monozygotic (MZ) twins are genetically identical, while dizygotic (DZ) twins share half of their genes, same as any other siblings. However, MZ and DZ twins are often raised in very similar environments, especially if of the same sex. Thus twins are naturally occurring experiments in the relative effects of genetics and environment. – A further problem: DZ twins of the same sex often look quite different—they may not really get the same degree of similar treatment as MZ twins. • The relevant statistic: concordance. – If twins both have the disease, they are concordant. – If one has it and the other doesn’t, they are discordant. • Traits with a large genetic component will show a higher concordance rate for MZ twins than for DZ twins. • For 100% genetic trait, should be 1.0 in MZ twins and 0.5 for DZ of same sex. Concordance ratio: HPT: hypertension; RP: Raynaud’s phenomenon (fingertips MZ/ DZ. the higher it is, the more genetic a trait is. lose circulation when exposed to cold or strong emotion); MIG: migraines; CAD: coronary artery disease Zygosity Diagnosis • Fetal membranes: the chorion is the outer embryonic membrane; it is derived from the fertilized egg. – All DZ twins have separate chorions, but ¾ of MZ twins have a single chorion. – Some MZ twins are within the same amnion (inner membrane) while others aren’t. Some DZ twins share a placenta. • Anthropometric traits (i.e. body measurements): fingerprints (but they are not necessarily identical), eye and hair color, facial structure. An experienced observer can diagnose zygosity very accurately. • Genotype: blood groups, DNA markers. Should be identical in MZ. Many DNA tests available for parents nowadays. • Complications: MZ twins will have different patterns of X chromosome inactivation (if female), different sets of immunoglobublin genes, and they will have been exposed to slightly different environments since the moment the initial embryo split. They are not 100% identical, and in practice MZ twins are not 100% concordant for complex traits. Adoption Studies • Adopted children are good for separating genetic effects from environmental effects. • Two basic ideas: 1. if you look at adopted people with a genetic disease, does that disease run in their biological family or in their adopted family? 2. If the biological parents of an adopted child have a genetic disease, does the child get the disease despite being raised in a different family? – Problem: adoption is not a random process: a lot of effort often goes into matching characteristics between birth family and adoptive family. – Also, studying the genetics of the birth family can be difficult due to privacy issues. Segregation Analysis • A method of pedigree analysis that can estimate the number and mode of inheritance of major genes contributing to a disease as well as their mode of inheritance as well as incomplete penetrance and environmental components. • A maximum likelihood method similar to lod scores: vary a large number of parameters systematically and find the combination that best fits the data. • But: more parameters = more degrees of freedom = a need for larger amounts of data. • And, a truism from the world of computers: garbage in, garbage out. If the options and parameters you use in segregation analysis don’t cover the true situation, you will get a wrong answer. Ascertainment Bias • Segregation analysis requires a lot of data from affected families. Problem is, a family with no affected members will probably not be counted even if their children are at risk. This is called ―complete truncate ascertainment‖: you see all families with at least one affected child. – another model of ascertainment bias (―single selection‖) starts from the premise that the chance of seeing an affected family is proportional to the number of affected children. (Recalling a bit of military dogma: something happening once is happenstance, twice is coincidence, three times is enemy action.) • For example: two heterozygotes for a recessive trait have 2 children: Aa x Aa. – There is a 1/4 chance of an affected child; thus there is 1/16 chance that both children are affected, 6/16 chance that one child is affected, and 9/16 chance that neither is affected. – In 16 possible families (2 children each), there will be 32 children of which 8 are affected (= 1/4). – However, the families with no affected children will never be seen, so if you just look at families with at least one affected, there are 7 families, 14 children, and 8 children are affected (= 4/7). – This fraction is very different from 1/4, and might lead you to suspect a dominant rather than a recessive, or some other theory. – Mathematical methods exist to correct this bias. • Another approach: select families at random, as visitors to a health clinic for example. Linkage Analysis • Unfortunately, the standard lod score method doesn’t work well for complex traits, because it requires a definite model of how the trait is inherited: the first step in lod score mapping is to determine the expected frequency of offspring phenotypes as a function of the recombination fraction. • Non-parametric methods: look for chromosome segments shared by affected individuals. Doesn’t rely on genetic model. – affected sib pair analysis – linkage disequilibrium Affected Sib Pair Analysis • If two siblings both are affected by a genetic disease, they will (in most cases) share a region of chromosome surrounding the disease gene. This segment is ―identical by descent‖ (IBD): it was derived from a common ancestor, their parent. – if it is uncertain that the siblings inherited the disease gene from the same parent, they fall into a less-useful category of ―identical by state‖ (IBS). • use many co-dominant markers to find IBD regions among many affected sib pairs. • Usually results in a large region, a significant fraction of an entire chromosome: too big for positional cloning. • Also: if more than one gene causes the trait, the necessary large amount of data will never converge to a single chromosome region. Linkage Disequilibrium • Gene mapping using recombination methods (such as affected sib pair analysis) suffers from not having enough crossovers in one generation to localize a gene very well. • Linkage disequilibrium uses crossovers that have occurred over several generation. • The idea is that if you start with a single mutation, over many generations crossing over will separate the region around the mutation from most of the rest of the genome. – Regions of chromosome distant from the disease mutation will become randomized. – However, right near the mutation random crossovers will not have separated the disease locus from its surrounding haplotype: a particular DNA haplotype will be in disequilibrium with the disease trait. – The trick is to find that haplotype. – The further back in time since the mutation occurred, the smaller the region of disequilibrium. More Linkage Disequilibrium • A major complication: turns out that whole blocks of chromosomes get inherited together over many generations. Crossing over isn’t completely random. Means that genes occur in LD blocks separated by recombination hotspots. • Another problem: LD methods depend on there being only a single original disease mutation that occurred in a particular haplotype. Multiple mutations will each have their own LD haplotype. A Few Case Studies • Varying levels of success: there’s a long way to go on this problem • Breast Cancer • Alzheimer Disease • Type 1 (insulin-dependent) diabetes • Schizophrenia • Inflammatory Bowel Disease Breast Cancer • Most cancer is spontaneous, but certain types run in specific families. – Often associated with early onset of the disease • For breast cancer, the lifetime risk for a woman is about 1 in 12. (For men, the risk is about 1 in 1000). • However, roughly 5% of cases are from families with near-Mendelian inheritance. – Recurrence risk is about 47-fold for sisters of breast cancer patients whose mothers also had it, as opposed to random members of the population. – These families also have early onset (age 45 or less), bilateral involvement, and associated ovarian cancer in many cases. Also, some males in these families get it. • Complex segregation analysis concluded that transmission was most compatible with an autosomal dominant gene. • Large segregation analysis found a gene at 17q21. Used 23 extended families of European origin with 146 cases of breast cancer. Lod score of 5.98 for early onset cases, but no linkage for late onset. A later study almost as large failed to find this linkage, but it was confirmed by other studies. – Lifetime penetrance of 80-90% for breast cancer and 40-50% for ovarian (since revised downward to 60-80% for breast cancer) More Breast Cancer • Positional cloning (with accompanying agony) of the gene, called BRCA1. Lots of different mutations seen, in many ethnic groups. • Another major gene, BRCA2, mapped to 13q12 and cloned. • Both genes are involved in DNA repair as part of a larger complex. Other cancers sometimes include BRCA1 and BRCA2 mutations. DNA repair failure may be the actual effect of these genes on cancer. • However, many spontaneous cases of breast cancer don’t involve either gene, and these genes don’t cover all of cancer family cases either. There are several other genes known to increase susceptibility, and there is also a background level of polygenic risk. A case of the phenotype being distant enough from the genes to make an exact causation pathway difficult. Alzheimer Disease • disease first described by Alois Alzheimer in Germany in the late 1800's. His boss named the disease after him. • amyloid plaques are found in the brains of people with Alzheimer's, and also in people with trisomy-21, Down syndrome. Plaque formation seems to be an early and inevitable part of the disease. • --the amyloid consists of aggregates of a 42 amino acid polypeptide, which comes from exons 16 and 17 of a protein called "amyloid beta A4 precursor protein" or APP. • Some families show an early onset type (before age 65) that inherits in near Mendelian fashion as an autosomal dominant. – mapped to 21q21 • Further mapping showed 3 other major genes caused early onset familial Alzheimer’s: • Other early onset families led to locating presenillin 1 and presenillin2. These proteins are part of a transmembrane protein complex called ―gamma-secretase‖. – amyloid is generated from APP by 2 cleavages. First, beta- secretase cuts of mot of the extracellular portion of APP. Then, gamma secretase cuts the remaining portion of APP within the membrane. This generates the 42 amino acid amyloid protein that aggregates and causes the disease. – Mutant forms of PS1 or PS2 increase the rate of gamma- secreatse cleavage and thus increase the production of a form of amyloid that gets deposited in Alzheimer's More Alzheimer Disease • Families with multiple cases of late onset AD identified a locus at 19q13, later identified as APOE. – APOE is apolipoprotein E, with 30 or so known alleles, several of which lead to elevated cholesterol and triglycerides in the plasma, along with increased risk of atherosclerosis. The protein is the main component of chylomicrons (lipid digestion). The variant forms (called E2) don't bind well to the receptor on liver cells. Also binds to amyloid protein. – Alzheimer's is associated with the E4 form, whose allele frequency is about 15% in European origin people. Homozygosity almost inevitably leads to Alzheimer's by age 80, with heterozygosity increasing the risk. – But, many Alzheimer's cases don't seem to involve this gene, so gene tests for it don’t do a good job of identifying risk. Type 1 Diabetes • insulin-dependent diabetes mellitus (IDDM). An autoimmune disease in which the insulin- secreting beta cells of the pancreas are destroyed. usually occurs at a young age, and brings a lifelong dependence on insulin. • Insulin causes sugar to move from the blood into the cells. In the absence of insulin, blood sugar builds up while the cells starve. • Insulin is a peptide hormone that binds to a surface receptor and initiates a signaling cascade that results in the activation of glucose transporters More Diabetes • Family association: λs = 15, and MZ concordance is 30%. • Linkage to MHC, the class 2 gene DQ beta. Low risk is associated with an aspartic acid at residue 57; other amino acids there confer a high risk. – in one study, 96% of diabetics were homozygous for an amino acid there than aspartate at this position, while only 20% of controls were homozygous for non-Asp. – This difference accounts for a 30 fold difference in the rate of IDDM when comparing Chinese and European-Americans. • A second important locus for IDDM is INS, the insulin gene itself. Specifically, a region at the 5’ end of the gene, a 14 bp minisatellite. – A small number of repeats (26-63) causes susceptibility to IDDM, while people with a large number of repeats (140-210) don’t get diabetes. – More repeats means a 2-3 fold increased rate of insulin transcription. – The argument is that more insulin means that during maturation of T cells in the thymus around the time of birth, any that react to insulin are more likely to be deleted if the beta cells are producing more insulin. • At least 15 other loci have been identified that increase susceptibility to IDDM. All seem to have small effects, and replication of results in different studies often doesn’t occur. • The disease is genetically heterogeneous. Schizophrenia • Schizophrenia is a psychiatric diagnosis with no obvious physical manifestations. It involves disorganized thinking, impairment of the person’s perception of reality and ability to function socially. Sometimes hallucinations or delusions. It may be more than one disease. – A common misunderstanding is that schizophrenia involves split or multiple personalities. It doesn’t: that’s a different disease. ―Schizo‖ means ―divided from reality‖. • Schizophrenia typically appears in the late teens and early 20’s. Incidence in the general population is about 1% • It can be triggered by stressful events or drug use, or have no obvious triggering event. • There is a strong genetic component. – Concordance rate for MZ twins is 46%, even when reared in separate families, and 14% for DZ twins of the same sex. – Recurrence risk is 14 for first degree relatives, 4.25 for second degree and 2 for third degree. – Adoption studies show that the risk is associated with the biological family and not the adopted family. • Numerous mapping studies using many methodologies over the past 30 years have identified 20 or more chromosomal regions that seem to be associated with schizophrenia in some pedigrees. Some show up repeatedly, but often they are not reproduced in further studies.