Drug-dependence disorders (we focus here on cocaine, opioid, and nicotine dependence) are genetically influenced. Risk genes have been located based primarily on genetic linkage studies, and identified primarily based on genetic association studies. In this article we review salient results from linkage, association, and genome-wide association study methodologies, and discuss future prospects for risk allele identification based on these, and on newer, methodologies. Although considerable progress has been made, it is likely that the application of more extensive sequencing than has previously been practical will be required to identify a fuller range of risk variants.
It is well established that risk for many substance-dependence traits is genetically influenced; this is the case for each specific substance that has been studied. This has been determined using the methods of genetic epidemiology, the most relevant of which, for this purpose, are twin and adoption studies. We discuss relevant findings from genetic epidemiologic studies of drug use and use disorders below.
In considering drug dependence, we include the most commonly used illegal substances (primarily cocaine, opioids, marijuana, and methamphetamine) and also nicotine, a legal drug that is the dependence-causing substance in tobacco. Alcohol dependence (AD) shares many risk genes with the drug-dependence disorders, but is beyond the scope of the present article. We have recently reviewed AD genetics elsewhere. 
As is usual for complex traits, risk for drug dependence is influenced by both genetic and environmental factors. Compared with most other kinds of traits though, environmental factors, most obviously exposure to the substance, are crucial - you cannot become heroin-dependent, for example, if you live in an environment with no access to heroin. Because the availability of illegal substances of abuse varies over the world (to a much greater extent than the availability of either alcohol or tobacco), and also varies with time as a function of secular trends in substance use that are determined by fads, trends in law enforcement, and other factors, patterns of substance dependence are very different across the globe. Genetic epidemiologic studies have helped to clarify the important implications of this environmental variation for genetic studies.
Family studies have shown substantially higher rates of drug abuse among siblings (particularly those whose parents were positive for substance abuse) than among individuals in the community.  Such studies have also provided evidence for both general familial aggregation for substance-use disorders and substance-specific aggregation across a wide range of drugs, including nicotine, opioids, cocaine, and cannabis. , However, while these designs demonstrate that drug-dependence disorders are familial, they cannot distinguish between genetic and environmental contributions to this familiality. A demonstration of genetic contributions to these and other disorders requires other designs, prominently twin and adoption studies.
Two adoption studies conducted by Cadoret et al , showed that the only biological factor that was significantly associated with drug abuse in the proband was an alcohol problem in first-degree relatives. However, Tsuang et al,  studying a sample of more than 3000 twin pairs, found a significantly greater pairwise concordance rate for monozygotic (MZ) than dizygotic (DZ) twins for abuse of marijuana, stimulants, cocaine, and for all drugs combined. Using twin pairs ascertained through the Virginia Twin Registry, Kendler et al examined concordance rates for drug use and dependence among more than 800 female-female pairs. ,, Model fitting showed that twin resemblance for liability to the use of cocaine, cannabis, hallucinogens, opioids, and sedatives was due to both genetic and family environmental factors. Liability to abuse or dependence on cocaine and cannabis was due only to genetic factors. In contrast, however, in another study by Kendler et al  of the use and misuse of six classes of illicit drugs by nearly 1200 male-male twin pairs, model fitting revealed that one common genetic factor exerted a potent influence on risk for both substance use and misuse for all six substances. There was a modest effect on risk of substancespecific genetic factors seen for substance use, but in contrast to other studies cited above, not for abuse or dependence. A single common shared environmental factor was also found to exert an effect on risk of substance use, and to a lesser extent, on risk of abuse dependence.
Despite some contradictory findings, overall, the data from adoption, twin, and family studies support a substantial genetic contribution to drug dependence, including the existence of genetic factors specific to each of these disorders, and factors common to these disorders and other forms of substance dependence.
It is only common genetic factors (that is, those that influence more than one substance) that are likely to be important worldwide (genetic factors specific to substances will vary because the specific substances vary). Whether genes relevant to drugs of abuse that have some similarities in their mechanisms of action, such as cocaine (important, eg, in the US) and methamphetamine (predominant, eg, in Thailand, and important in certain regions in the US) will prove to overlap, is still an open question. Further, different risk factors may be important in different populations (discussed in ref 1). In the small number of instances where similar SD traits have been studied in different populations, the genetic factors uncovered have not been identical.
Thus, gene mapping for substance-dependence (SD) traits is complicated. Some risk alleles identified may be important only for specific substances of abuse and others, only for certain populations. So why try to map genes for SD traits? First, SD is a huge cause of morbidity and mortality worldwide; that is, it is a very important problem that deserves to be studied despite its complexity. Second, despite all of the a priori reasons to believe that it would be exceedingly difficult to identify genes and validate the findings, the track record for SD genetics as a field is really very good. Below, we will review some recent results that support this claim.
Genome-wide linkage studies, the traditional approach to identifying risk loci, provide chromosomal locations for risk-influencing loci based on the observation of coinheritance of marker alleles and the disease trait in families. To be comprehensive, linkage studies employ markers that map throughout the entire genome. This approach has been used for cocaine, opioid, and nicotine dependence, and for related traits.
We are aware of only one linkage study of cocaine dependence (CD); we studied a sample of small families each with at least one subject affected with CD, which included 528 full and 155 half sibpairs and was 45.5% European-American (EA) and 54.5% AfricanAmerican (AA).  We completed an autosomal genomewide linkage scan for the CD diagnosis, cocaine-induced paranoia, and cocaine-related subphenotypes derived using cluster analytic methods. The subtyping procedure was used to identify more genetically homogeneous subgroups of subjects in which the effects of individual risk loci might be more prominent. For CD, we found “suggestive” linkage signals on chromosome 10, in the full sample, and on chromosome 3, in the EA part of the sample. Much stronger results were obtained for the cluster-derived subtypes, including genome-widesignificant lod scores for membership in the “Heavy Use, Cocaine Predominant” cluster on chromosome 12 and for membership in the “Moderate Cocaine and Opioid Abuse” cluster on chromosome 18. In AA families only, we observed a genome -wide-significant lod score on chromosome 9 for the trait of cocaine-induced paranoia. Genome -wide significance was defined on the basis of Lander and Kruglyak's 1995 criteria. 
There have been three independent genome -wide linkage studies of opioid dependence (OD). We studied 393 small families each with at least one individual affected with OD.  We completed a genome -wide linkage scan for DSM-IV OD, and, as for the CD study, for clusterdefined phenotypes, a heavy-opioid-use cluster, and a non-opioid-using cluster. The strongest results were, again, seen with the cluster-defined traits: for the “heavy opioid users” cluster there was a genome-widesignificant linkage for EA and AA subjects combined, on chromosome 17. For the “nonopioid users” cluster, there was a genome -wide-significant linkage elsewhere on chromosome 17, for EA subjects only. Lachman et al  studied a mixed US sample of 305 OD-affected sibling pairs, and identified evidence for linkage on a region of chromosome 14 overlying the neurexin 3 gene (NRXN3). They also identified a male-specific linkage peak on chromosome 10q. Finally, Glatt et al  studied a sample of nearly 400 independent affected sibling pairs ascertained in China near the Golden Triangle, one of Asia's largest illicit opium-producing areas, but did not identify any strongly -supported linkage signals, despite the presumed genetic homogeneity of the sample. The strongest signal they observed was on chromosome 4q.
There have been numerous genome -wide linkage scans for smoking and related phenotypes, reviewed in ref 20. Hanet al  completed genome scan meta-analysis (GSMA) of genome -wide linkage scans for nicotine dependence (ND) and related traits, pooling all available independent genome scan results on smoking behavior. To minimize locus heterogeneity, subgroup analyses of the smoking behavior assessed by the Fagerstrom Test for Nicotine dependence (FTND) and maximum number of cigarettes smoked in a 24-hour period (MaxCigs24) were also carried out. Fifteen genome scan results were available for analysis, including 10 253 subjects in 3 404 families. The primary GSMA across all smoking behavior identified a genome -wide “suggestive” linkage in chromosome 17q24.3-q25.3. But the strongest result derived from the subgroup analysis of MaxCigs24 (including 966 families with 3 273 subjects), which identified a genome-wide significant linkage in 20q13.12-q13.32. CHRNA4, a strongly supported ND candidate gene, is located in this interval; Li et al  previously reported on association of CHRNA4 variants to ND.
A high level of statistical support for a genetic linkage is very valuable, but the ultimate proof that a disease-influencing locus underlies a statistical linkage peak is the identification of a risk gene in the peak that accounts for the linkage signal. The next step is typically genetic association analysis, ie, evaluation of a set of markers that map under the linkage peak for association with the trait. Genetic association provides another degree of statistical evidence, but eventually, proof of a disease-gene relationship must rely on demonstration of a functional effect of a variant or variants at the risk locus. ND is the furthest of all drug-dependence (DD) traits along this pathway, with numerous loci supported on the basis of statistical genetic association evidence, and some of these loci have received the higher level of support of functional data.
Strategy for single-nucleotide polymorphism (SNP) selection plays a key role in association study outcome. In general, variants predicted to have functional consequences, eg, because they alter predicted amino acid sequence, have been favored for study; alternatively, researchers often try to capture most of the genetic variation at a locus via selection of haplotype tagging SNPs followed by haplotype reconstruction. It is important to recognize some of the limitations of these strategies at the outset. Although most common putatively functional SNPs are known, rarer SNPs may have large phenotypic effects, and there are many such variants yet to be discovered. Not all functional SNPs are easily recognized as such. SNPs vary by population, and populations differ in the extent to which common genetic variation has been identified. The same population variation is reflected in differences in haplotype structure. Finally, haplotype reconstruction is almost always accomplished via computer algorithms, and the results are estimated. With these limitations in mind, we discuss several examples of genetic associations with DD phenotypes, focusing on interesting physiological candidates and on replicated findings.
Association of variants that map at or near the D2 dopamine receptor (DRD2 locus) with drug or alcohol dependence was proposed many years ago and has been widely debated. We identified a “suggestive” linkage peak for ND at the region of chromosome 11 that includes the NCAM1-TTC12-ANKK1-DRD2 gene cluster.  The inconsistent results with DRD2 may be attributable to an indirect effect - observed association could actually be mediated through variation at a nearby locus in linkage disequilibrium with DRD2. To test this hypothesis, we genotyped 43 SNP markers in a region including DRD2 and the three adjacent genes, in an SD linkage sample of >1600 subjects. We found very strong evidence of association of multiple SNPs at TTC12 and ANKK1 in two different populations, EAs and AAs (minimal P=0.0007 in AAs and minimal P=0.00009 in EAs), and highly significant association of a single haplotype (set of markers) spanning TTC12 and ANKK1 to ND in the pooled sample (P=0. 0000001). Thus, a risk locus for ND maps to a region that spans TTC12 and ANKK1 . The exact localization of the risk haplotype depends on the disease definition, and whether and which co-occurring diagnoses are present in the study sample. 
These results support the hypothesis that the DRD2 findings could be attributable to variants in nearby loci. Such variants could reflect either functional variation that affect those loci (and not DRD2), or relatively distant regulatory regions important for DRD2 function. The ANKK1 finding in ND has been replicated. 
Another set of risk loci that are of interest in relation to the risk of drug dependence are those encoding proteins that regulate or mediate opioidergic function. All of the opioid receptor genes have been reported to be associated with substance dependence liability. A functional polymorphism in OPRM1 (Asn40 Asp), which encodes the mu-opioid receptor, has been the most extensively studied in this regard, though the association is controversial. Although multiple studies have shown a significant allelic association with DD, they are nearly evenly divided between those showing a significant excess of the Asp40 allele among cases , and those showing a significant excess of the Asp40 allele among controls. ,, Consequently, meta-analyses of that literature failed to show a reliable association of the SNP with either OD  or any SD disorder.  However, Zhang et al  examined 13 SNPs spanning the coding region of OPRM1 in a sample of EAs with AD and/or DD and 338 EA healthy controls. The SNPs formed two haplotype blocks. There were significant differences between cases and controls in allele and/or genotype frequencies for SNPs in Block I and in Block II, after correction for multiple testing. Haplotypes constructed from five tag SNPs differed significantly in frequency between both AD and DD subjects and controls. Logistic regression analyses in which the sex and age of subjects and alleles, genotypes, haplotypes, or diplotypes of the five tag SNPs were considered confirmed the association between OPRM1 variants and SD.
Zhang et al  also examined the genes encoding the other two opioidergic receptors: OPRD1 (which encodes the delta receptor) and OPRK1 (which encodes the kappa receptor). Eleven SNPs spanning OPRD1 were examined in EAs with AD, CD, and/or OD, and control subjects. Although nominally significant associations were observed for five SNPs with SD, only the association of the nonsynonymous variant G80T with OD remained significant after correction for multiple testing. Haplotype analyses with six tag SNPs indicated that a specific haplotype was significantly associated with AD and OD (P<0.001). In logistic regression analyses, controlling for sex and age, this haplotype had a risk effect on AD and, to a much greater extent, on OD. In addition, seven SNPs covering OPRK1 were examined in the majority of subjects and although there were no significant differences in allele, genotype, or haplotype frequency distributions between cases and controls, a specific OPRK1 haplotype was significantly associated with AD, but not DD. In summary, these findings demonstrated a robust positive association between OPRD1 variants and SD, particularly OD.
Finally, Zhang et al  studied POMC, the gene that encodes pro-opioimelanocortin, from which functionally different peptides are derived via tissue-specific post-translational processing; of particular relevance here are two principal elements of the hypothalamic-pituitaryadrenal axis: adrenocorticotropin (ACTH) and p-endorphin. Five SNPs spanning POMC were examined in independent family and case-control samples of EAs and AAs. The families were ascertained based on a pair of siblings affected with cocaine and/ or opioid dependence. Case-control studies included cases affected with AD, CD and/or OD and controls. Family-based analyses revealed an association of one SNP (rs6719226) with OD in AA families, and a different SNP (rs6713532) with CD in EA families. Case-control analyses demonstrated an association of rs6713532 with AD or CD. Moreover, the minor allele of a third SNP was a risk factor for CD or OD in AAs, and for AD, CD, or OD in EAs. Logistic regression analyses in which sex and age were considered and population stratification analyses confirmed these findings. Additionally, specific haplotypes increased risk for CD in AAs and OD in EAs.
In summary, as might be expected given that the brain's opioidergic system plays a central role in reinforcement, which has important implications for addiction,  variation in a number of functional candidate genes encoding opioidergic proteins have been implicated in dependence on alcohol, cocaine, and opioids. Assuming independent replication of these findings, a key question to be addressed is the nature of gene-gene and gene by environment interactions to which risk of SD is attributable.
Other studies have demonstrated associations with the cannabinoid receptor gene (CNR1), ,, neurexin 1 (NRXN1),  and a set of alcohol-metabolizing enzymes.  A clear pattern emerges from the examination of this sampling of candidate gene associations with SD: insofar as genes with known function are concerned, there are no big surprises with respect to physiology. (This can not be said about genes without clearly delineated functional roles, such as ANKK1, which was identified, not incidentally, based on its position, rather than its function.) This highlights the limitations of the candidate gene approach, which is often inherently biased by prior knowledge about physiology. Unbiased studies have greater potential to reveal new mechanisms of addiction, and that is a key attraction of the genome -wide association study (GWAS) methodology discussed below.
GWASs are an alternative to linkage for locating genes anywhere in the genome without prior hypotheses.
GWAS designs are of interest due to their potential to identify risk loci of relatively small effect, much smaller than through linkage strategies. (In fact, one controversy engendered by the widespread adoption of GWAS designs is that often risk alleles are identified that have such a small effect - typically with odds ratios less than 1.2 - that it is hard to know what to do with them once they have been identified.) A second advantage of GWASs is that they may be based on case-control samples, which are easier to recruit than family sampling schemes, which must be deployed to prepare for linkage. Family samples are more difficult to recruit (markedly so for many kinds of SD because of the tendency of these disorders to fragment families) and can introduce certain kinds of bias. The first GWAS for a specific SD trait, excluding studies that used a pooling methodology exclusively (see ref 42), examined ND.  This study employed a two-stage design; first pooled DNA was used to screen 2.4 million SNPs; second, >30 000 SNPs selected from the first stage were screened individually in ~1000 each cases and controls. Numerous genes were identified as possibly associated to ND, including both novel genes and genes that were previously considered candidates based on known physiology (eg, cholinergic receptor, nicotinic, beta 3, CHRNB3). The latter finding has been confirmed in larger studies: subsequent GWASs have demonstrated highly significant associations between variation in the nicotinic receptor gene cluster CHRNA5-CHRNA3-CHRNB4 and ND and related traits , and with lung cancer. ,
In a hypothesis-generating study, we studied a set of 5633 SNP markers in 1699 subjects from 339 AA families and 334 EA families ascertained through a sib pair meeting DSM-IV criteria for either CD or OD. This is considered a sparse marker set for the purposes of GWAS. It is expected to interrogate <10% of the genome, thus, cannot be considered to be a study of truly genome -wide depth. Associations between these markers and five substance dependence traits (CD, OD, AD, ND, and cocaine-induced paranoia) were assessed by family -based association tests (FBAT). The top-ranked result was an association of a specific SNP in the MANEA gene with cocaine-induced paranoia. This study provided an initial SD trait-specific blueprint of associated regions for future candidate gene studies. There are, at the time of this writing, no published GWAS studies for several of these traits. The MANEA finding was replicated and extended in a larger sample. 
We identify two main ways to account for the relatively consistent results seen in this field. First, diagnosis can be made with high reliability. Second, the phenotypes are relatively straightforward because they are, in their essence, pharmacogenetic. That is, SD phenotypes reflect genetic moderation of the subjective response to drugs of abuse.
While results in this research field have been relatively consistent, most of the genetic risk for DD has yet to be attributed to specific alleles. Initially, it was thought that the GWAS was the answer to the problem. But application in other complex traits (eg, schizophrenia, bipolar affective disorder, autism) has revealed a more complex picture, such that even clinical samples that should have been adequately powered have fallen short of providing definitive and significant results. The explanation for this situation may reside in the fundamental genetic architecture of some complex traits. GWAS is based on a common-disease-influenced-by-common-allele model. However, we are now learning that many phenotypes are influenced instead by sets of variants, in sets of loci, each of which is rare on a population level. Such variants are likely to be uncovered only by extensive sequencing of affected and unaffected individuals. Copy number variation (CNV) is another mechanism that is proving to be important in modulating disease risk. Such variation is important for at least some behavioral traits; for example, Sebat et al  have reported on the relationship of CNV to autism, and several groups have reported association of rare structural variants with schizophrenia. ,,,
We have seen several successful examples of genetic association identified following a linkage finding, a sequence that demonstrates the main utility of genetic linkage. But there have also been surprisingly many instances when strong genetic association has not been identified readily. There are many ways to account for such a circumstance - genetic heterogeneity, random variation, and population variation, to name a few. Another intriguing possibility has become more prominent of late. The linkage-to-association-to-gene model is premised basically on the common disease-common variant model discussed above. This model may not be as applicable as was thought; there is increasing evidence that heritability may be accounted for by many rare variants in either a single locus, or a set of related loci. Since linkage depends on the identification of coinheritance of trait and marker within families, it stands to reason that a set of different rare variants could be detected by linkage (even if the responsible variants differed greatly between families in the discovery set). Such variants would be very resistant to discovery by ordinary tagging haplotype association strategies. Similarly, such variants would be expected to be refractory to discovery by GWAS methodology. Deep sequencing studies have successfully accounted for the “missing” genetic variance in some cases. For example Nejentsev et al  found a set of individually rare variants at the IFIH1 locus that affect risk for type 1 diabetes, following up on a GWAS study. Ji et al  started with a set of genes known to have large effects on blood pressure in a small number of severely affected families, and sequenced them in a large number of unrelated individuals. Rare variants with smaller effects on blood pressure were identified. These findings are likely to be relevant for SD genetics research as well, inasmuch as deep sequencing of candidate loci in many unrelated individuals may be necessary to account for a greater proportion of the genetic risk than is presently known.
Whole-genome sequencing is becoming progressively less expensive, and will surely ultimately be feasible for locating genetic variants that increase risk for complex genetic traits, albeit at the risk of daunting statistical problems. Sequencing of expressed sequences only ('whole exome”) may be a valuable interim step. Ng et al  have demonstrated the feasibility of this approach. In summary, new developments in a variety of genetic methods and in the accumulating molecular evidence of the genetic risk for SD promise to yield greater insights into the etiology of these disorders, bringing into relief the environmental contributions and creating opportunities for prevention and new therapeutic options.