Cloning methods rely on molecular biological processes that occur in nature. The techniques are continually being refined and simplified; therefore, many strategies nowadays permit cloning of sequences of interest from their sources more efficiently. These cloning strategies include:
PCR cloning is a method in which double-stranded DNA fragments amplified by PCR are ligated directly into a vector. PCR cloning offers some advantages over traditional cloning which relies on digesting double-stranded DNA inserts with restriction enzymes to create compatible ends, purifying and isolating sufficient amounts, and ligating into a similarly treated vector of choice (see insert preparation).
With PCR amplification, this cloning technique requires much less starting template materials which include cDNA, genomic DNA, or another insert-carrying plasmid (see subcloning basics). Furthermore, PCR cloning provides a simpler workflow by circumventing the requirement of suitably-located restriction sites and their compatibility between the vector and insert. Nevertheless, there are a number of considerations related to: PCR primers and amplification conditions, the cloning method of choice and the cloning vectors used, and, finally, confirmation of successful cloning and transformation.
With respect to PCR amplification of a sequence of interest, primers must be designed and PCR conditions (components and cycling) optimized for efficient and specific amplification of the template. Primer design tools are available to bioinformatically evaluate and select suitable target-specific primer sequences for amplification. Ligation requires that either the insert or vector has 5′-phosphorylated termini; therefore, if the cloning vector lacks 5′-phosphorylated ends, 5′-phosphate groups must be added to the PCR primers during synthesis or by T4 polynucleotide kinase for successful ligation. For PCR optimization, reaction component concentrations, annealing temperatures, and template amounts are of importance.
TA cloning and blunt-end cloning represent two of the simplest PCR cloning methods. Their choice depends upon the nature of the vector and the type of PCR enzymes used in cloning. TA cloning employs a thermostable Taq DNA polymerase capable of amplifying short DNA sequences. This enzyme lacks 3′→ 5′ proofreading activity and features a terminal transferase activity that adds an extra deoxyadenine at the 3′ end of the amplicons (3′ dA). The resulting PCR products with 3′ dA overhangs are readily cloned into a linearized TA cloning vector containing complementary 3′ deoxythymine (3′ dT) overhangs (Figure 1). While relatively straightforward, the limitations of this method include the length of insert (up to 5 kb), the inability to clone inserts directionally, and the high error rate associated with Taq DNA polymerase.
Blunt-end cloning involves the ligation of an insert into a linearized vector where both DNA fragments lack overhangs. Blunt-end inserts can be produced using high-fidelity DNA polymerases with 3′→5′ exonuclease or proofreading activity. Their proofreading activity improves the sequence accuracy of the amplified products; however, limitations include lower ligation efficiencies when inserting into blunt-end cloning vectors and the inability to clone directionally. Ligation efficiency can be improved by incubating the amplicons with a Taq DNA polymerase and dATP in a procedure called “3′ dA tailing” (incubate 20–30 minutes at 72°C), then purifying the 3′ dA-tailed products (Figure 1).
Figure 1. Common PCR cloning strategies.
To further simplify and streamline the cloning workflow, specialized vectors have been developed to place an insert into vector, for example, without using a ligase. One such class of vectors includes the Invitrogen TOPO cloning vectors which contain covalently linked DNA topoisomerase I that functions as both a restriction enzyme and a ligase (learn more about TOPO cloning technology). Compared to conventional PCR cloning vectors, these vectors result in shorter ligation reaction times (e.g., 5 minutes) and greater cloning efficiencies (e.g., >95% positive clones) and with a much simpler protocol. Furthermore, directional cloning of the PCR products can be achieved with a specially designed TOPO vector using a specific primer design.
Regardless of the cloning method choice, cloning efficiencies are significantly improved by purification of PCR amplicons prior to the ligation reaction. PCR clean-up helps remove salts, nucleotides, nonspecific amplicons, and primer-dimers. After ligation and transformation into the appropriate competent cells, the resulting colonies need to be screened carefully for the correct insert, as well as its proper frame and orientation for subsequent studies to analyze gene fusions and/or protein expression.
Subcloning refers to moving one fragment of a plasmid into another plasmid that can serve as a vector. There are a variety of reasons why it is necessary to transfer the fragment of interest into a different vector backbone. For instance, the new vector may possess a specific marker for antibiotic selection or fluorescent expression. Subcloning may also be performed to move a cloned fragment to an expression vector of a more suitable host for the study (e.g., bacteria, mammals, insects, plants, etc.); to place the gene of interest under a different expression promoter (e.g., a constitutive to inducible promoter); or to tag or fuse the experimental gene with another protein or a marker. Whatever the goal of the experiment may be, the two most common approaches to subcloning rely on restriction digestion and/or PCR cloning.
Subcloning by restriction digestion is the more traditional of the two methods. In this workflow, fragments from the vector and the insert are double-digested with two restriction enzymes that generate sticky or cohesive ends (Figure 2). Since the vector and the insert can ligate in only one orientation (i.e., directional) and the digested ends of the vector are incompatible for self-ligation, this is arguably the preferred and most common method among other possible restriction enzyme options (see insert preparation for some options). For subcloning in protein expression or gene regulation studies, the selected restriction enzyme(s) should allow in-frame cloning of the fragment of interest with close proximity to the start codon as appropriate.
A second popular approach uses PCR to amplify the region of interest from the plasmid. The resulting PCR product is then cloned into the desired vector. TA cloning or blunt-end cloning methods can be used as described in the PCR cloning section, but neither approach maintains directionality of the insert. To achieve directional cloning, restriction sites that are present in the destination vector for subcloning can be incorporated into PCR amplicons by using PCR primers designed with the restriction sites in the 5′ end of the PCR primers. Following the PCR reaction, PCR products are restriction digested, purified, and subcloned into the restriction sites of the vector.
There are a few considerations when designing the PCR primers with restriction enzymes sites. It is imperative that the introduced restrictions sites are unique and not present within the sequence of the fragment to be subcloned. The restriction sites should also be carefully designed to allow in-frame expression of the subcloned DNA. The cleavage efficiency of most restriction enzymes is greatly reduced when their recognition sites are close to termini of linear DNAs. To ensure proper digestion of the PCR fragments, a sequence with an extra 4–8 nucleotides (sometimes called “leader” or “spacer” sequence) is recommended at the 5′ end of the restriction sites on the primers (Figure 3). Although there is no consensus on the optimal spacer sequence, a general recommendation is to avoid sequences that may result in primer-dimers or secondary structure formation (e.g., palindromes and inverted repeats). Furthermore, the primer recognition sequence design should be longer than those of the restriction site and the spacer combined to ensure specificity and proper binding to the target. When calculating the Tm of the primers, only sequences that are perfect matches to the template should be included. Finally, purification of the primers may be necessary to ensure full-length DNA oligonucleotides when using long primer sequences.
Figure 3. Schematic workflow of PCR subcloning in combination with restriction digestion (RE = restriction enzyme site).
Other subcloning strategies have been devised to take advantage of special vectors that do not require the use of restriction enzymes or a ligase. One such example is Invitrogen Gateway cloning, which exploits unique recombination activities of the family of Invitrogen Clonase enzymes (Figure 4). This method involves use of specially designed Gateway-specific plasmids and Gateway-compatible insert ends (att sites) for recombination.
Designing PCR primers for cloning
Figure 4. Gateway cloning strategies. ccdB is a toxic gene used in bacterial cell selection.
In molecular cloning, DNA library construction refers to the creation of clones that carry DNA fragments representing the complete genomic DNA (gDNA) of a species, or the complementary DNA (cDNA) of RNA transcripts representing the expressed genome. By constructing DNA libraries, thousands of genetic fragments can be conveniently archived and expanded for downstream applications, such as genotyping and phenotypic screening. gDNA libraries serve as helpful tools to study the genetic composition of different species or gene mutations that occur in diseases such as cancer. cDNA libraries, on the other hand, are useful for expression analyses of genes and transcript variants based on the cell type and tissue origins (spatial), as well as time points (temporal).
The construction of gDNA and cDNA libraries shares many similarities but also some important differences. Both strategies include nucleic acid purification, sample preparation (e.g., restriction digestion), vector cloning, vector introduction into a suitable host (e.g., transformation or transduction), and clonal selection. As the starting materials are different between the gDNA library and the cDNA library, their purification and preparation employ different approaches; however, once the gDNA or cDNA fragments are cloned into the desired vector, the same workflow may be followed.
For genomic library preparation, gDNA is purified from the organism, tissues, or cells of interest. Extracted gDNA is then digested, isolated, and ligated into the vector of interest with compatible ends. Partial digestion of the genome is often carried out with a restriction enzyme with prevalent cutting sites to allow sequence overlaps between fragments for mapping of the cloned inserts (Figure 5).
Figure 5. Schematic diagram of complete vs. partial digestion of a fragment by a restriction enzyme with four cutting sites. Partial digestion results in overlapping sequences among fragments for mapping. (Only some possible partially-digested fragments are shown here for simplicity.)
Vector selection for gDNA libraries is an important consideration because the gene fragments used in the library constructions are often large (e.g., >20 kb). The choice of cloning vector, in turn, determines the method to deliver insert-carrying vectors into the host (Table 1) [1].
Table 1. Common vector types, cloned fragment lengths, and vector delivery methods in library construction.
Ligation products or recombinant DNA can be introduced directly into bacterial cells via transformation or packaged into bacteriophage for infection or “transduction” of the host cells (Figure 6). The transformed or transduced cells are intended for subsequent archiving, expansion, and sequencing in downstream experiments. Whole-genome sequences of many organisms, including the first whole human genome sequence, were determined using this basic strategy in early 2000 [2].
Figure 6. Schematic workflow of genomic library preparation using a λ phage vector. A genomic DNA sample is partially digested with Sau3AI, after which ~20-kb fragments (ideal size for viral packaging) are isolated for ligation with the viral gene fragments. The left and right arms of the λ vector comprise essential components for viral growth in the bacterial cells.
For cDNA library preparation, total RNA is extracted from a biological source (e.g., cells, tissue, etc.), after which mRNA is reverse transcribed into complementary DNA (cDNA). This process is known as first-strand cDNA synthesis. The second strand is then synthesized to obtain the double-stranded cDNAs. The resulting double-stranded fragments may be ligated directly into a blunt-end cloning vector (random cloning), or “tagged” at the ends with restriction sites for directional cloning (Figure 7).
cDNA libraries that provide good, faithful representation of the expressed genome depend on several factors including the quality and integrity of the source mRNA population. For the reverse transcription steps, it is also crucial that the reverse transcriptase is capable of synthesizing cDNA from a mixed and complex population, including long RNA templates and rare RNA transcripts, for adequate coverage within the libraries (see reverse transcriptase choices). Using the basic strategy outlined in Figure 7, many cDNA library preparations were used to construct comprehensive collections including the Mammalian Gene Collection (MGC), the largest NIH-sponsored public collection of cDNA clone libraries of mammalian species including human, mouse, and rat.
Figure 7. cDNA cloning strategies using mRNA with a poly-A tail. In random (non-directional) cloning, double-stranded cDNA are ligated directly to a blunt-end cloning vector. In directional cloning, adapters with rare restriction sites (e.g., NotI and SalI) are ligated to the double-stranded cDNA ends to clone into a vector with compatible ends.
Following library construction, one of the goals is to characterize the clones by sequencing the inserts. Insert sizes represented within these libraries can often range from 25 kb to 300 kb, depending on the type of vectors and the genome size of the organism of interest [1]. For Sanger sequencing, once the most widespread method for DNA sequencing, the upper limit of a sequencing reaction with good-quality reads is generally less than 1 kb. To overcome this dilemma, researchers can turn to shotgun cloning and sequencing. In this approach, the large cloned inserts are further fragmented by physical or enzymatic means and subcloned into another vector; the smaller cloned fragments are then sequenced. These sequences are reassembled thereafter based on sequence overlaps (termed contiguous or “contigs”) using bioinformatics programs to ultimately obtain the original long sequence (Figure 8).
Figure 8. Schematic workflow of shotgun cloning.
Shotgun sequencing is instrumental in whole-genome sequencing of many organisms, ranging from viruses and bacteria to human. The method can be used to sequence the genome de novo, as well as improve quality of already-sequenced genome by verifying reads and filling in gaps. During the first sequencing of the human genome, the publicly funded Human Genome Project employed shotgun sequencing of large gene fragments that had been cloned into a bacterial artificial chromosome or BAC vector. The genomic positions of the cloned fragments had been defined prior to shotgun cloning, making their shotgun sequence assembly easier. Hence, this method is known as hierarchical shotgun sequencing (Figure 9A). It is also called clone-by-clone sequencing due to the use of BAC clones as a source [3,4].
Concurrent with the Human Genome Project, another privately funded whole genome sequencing project led by Craig Venter used shotgun sequencing strategies directly on the human genome DNA (instead of cloned fragments that had already been mapped). This process is known as the whole-genome shotgun approach (Figure 9B) [5]. In theory, shotgun sequencing requires no prior information about the genome or genetic maps, and would save time and resources. Nevertheless, it is helpful to have reference genetic maps during sequence assembly because a large amount of computational power is required in the whole-genome shotgun approach, especially for organisms with sizable genomes. Genetic mapping or fingerprinting is routinely carried out using restriction enzymes [4], as in the methods of RFLP and AFLP.
Figure 9. Schematic workflow of two shotgun sequencing approaches used in whole human genome sequencing.