Site-specific recombination, also known as conservative site-specific recombination, is a type of genetic recombination in which DNA strand exchange takes place between segments possessing at least a certain degree of sequence homology. Enzymes known as site-specific recombinases (SSRs) perform rearrangements of DNA segments by recognizing and binding to short, specific DNA sequences (sites), at which they cleave the DNA backbone, exchange the two DNA helices involved, and rejoin the DNA strands. In some cases the presence of a recombinase enzyme and the recombination sites is sufficient for the reaction to proceed; in other systems a number of accessory proteins and/or accessory sites are required. Many different genome modification strategies, among these recombinase-mediated cassette exchange (RMCE), an advanced approach for the targeted introduction of transcription units into predetermined genomic loci, rely on SSRs.
Site-specific recombination systems are highly specific, fast, and efficient, even when faced with complex eukaryotic genomes. They are employed naturally in a variety of cellular processes, including bacterial genome replication, differentiation and pathogenesis, and movement of mobile genetic elements. For the same reasons, they present a potential basis for the development of genetic engineering tools.
Recombination sites are typically between 30 and 200 nucleotides in length and consist of two motifs with a partial inverted-repeat symmetry, to which the recombinase binds, and which flank a central crossover sequence at which the recombination takes place. The pairs of sites between which the recombination occurs are usually identical, but there are exceptions (e.g. attP and attB of λ integrase).
Based on amino acid sequence homologies and mechanistic relatedness, most site-specific recombinases are grouped into one of two families: the tyrosine (Tyr) recombinase family or serine (Ser) recombinase family. The names stem from the conserved nucleophilic amino acid residue present in each class of recombinase which is used to attack the DNA and which becomes covalently linked to it during strand exchange. The earliest identified members of the serine recombinase family were known as resolvases or DNA invertases, while the founding member of the tyrosine recombinases, lambda phage integrase (using attP/B recognition sites), differs from the now well-known enzymes such as Cre (from the P1 phage) and FLP (from the yeast Saccharomyces cerevisiae). Famous serine recombinases include enzymes such as gamma-delta resolvase (from the Tn1000 transposon), Tn3 resolvase (from the Tn3 transposon), and φC31 integrase (from the φC31 phage).
Although the individual members of the two recombinase families can perform reactions with the same practical outcomes, the families are unrelated to each other, having different protein structures and reaction mechanisms. Unlike tyrosine recombinases, serine recombinases are highly modular, as was first hinted by biochemical studies and later shown by crystallographic structures. Knowledge of these protein structures could prove useful when attempting to re-engineer recombinase proteins as tools for genetic manipulation.
Recombination between two DNA sites begins by the recognition and binding of these sites – one site on each of two separate double-stranded DNA molecules, or at least two distant segments of the same molecule – by the recombinase enzyme. This is followed by synapsis, i.e. bringing the sites together to form the synaptic complex. It is within this synaptic complex that the strand exchange takes place, as the DNA is cleaved and rejoined by controlled transesterification reactions. During strand exchange, each double-stranded DNA molecule is cut at a fixed point within the crossover region of the recognition site, releasing a deoxyribose hydroxyl group, while the recombinase enzyme forms a transient covalent bond to a DNA backbone phosphate. This phosphodiester bond between the hydroxyl group of the nucleophilic serine or tyrosine residue conserves the energy that was expended in cleaving the DNA. Energy stored in this bond is subsequently used for the rejoining of the DNA to the corresponding deoxyribose hydroxyl group on the other DNA molecule. The entire reaction therefore proceeds without the need for external energy-rich cofactors such as ATP.
Although the basic chemical reaction is the same for both tyrosine and serine recombinases, there are some differences between them. Tyrosine recombinases, such as Cre or FLP, cleave one DNA strand at a time at points that are staggered by 6–8bp, linking the 3’ end of the strand to the hydroxyl group of the tyrosine nucleophile (Fig. 1). Strand exchange then proceeds via a crossed strand intermediate analogous to the Holliday junction in which only one pair of strands has been exchanged.
The mechanism and control of serine recombinases is much less well understood. This group of enzymes was only discovered in the mid-1990s and is still relatively small. The now classical members gamma-delta and Tn3 resolvase, but also new additions like φC31-, Bxb1-, and R4 integrases, cut all four DNA strands simultaneously at points that are staggered by 2 bp (Fig. 2). During cleavage, a protein–DNA bond is formed via a transesterification reaction, in which a phosphodiester bond is replaced by a phosphoserine bond between a 5’ phosphate at the cleavage site and the hydroxyl group of the conserved serine residue (S10 in resolvase).
It is still not entirely clear how the strand exchange occurs after the DNA has been cleaved. However, it has been shown that the strands are exchanged while covalently linked to the protein, with a resulting net rotation of 180°. The most quoted (but not the only) model accounting for these facts is the "subunit rotation model" (Fig. 2). Independent of the model, DNA duplexes are situated outside of the protein complex, and large movement of the protein is needed to achieve the strand exchange. In this case the recombination sites are slightly asymmetric, which allows the enzyme to tell apart the left and right ends of the site. When generating products, left ends are always joined to the right ends of their partner sites, and vice versa. This causes different recombination hybrid sites to be reconstituted in the recombination products. Joining of left ends to left or right to right is avoided due to the asymmetric “overlap” sequence between the staggered points of top and bottom strand exchange, which is in stark contrast to the mechanism employed by tyrosine recombinases.
The reaction catalysed by Cre-recombinase, for instance, may lead to excision of the DNA segment flanked by the two sites (Fig. 3A), but may also lead to integration or inversion of the orientation of the flanked DNA segment (Fig. 3B). What the outcome of the reaction will be is dictated mainly by the relative locations and orientations of the sites that are to be recombined, but also by the innate specificity of the site-specific system in question. Excisions and inversions occur if the recombination takes place between two sites that are found on the same molecule (intramolecular recombination), and if the sites are in the same (direct repeat) or in an opposite orientation (inverted repeat), respectively. Insertions, on the other hand, take place if the recombination occurs on sites that are situated on two different DNA molecules (intermolecular recombination), provided that at least one of these molecules is circular. Most site-specific systems are highly specialised, catalysing only one of these different types of reaction, and have evolved to ignore the sites that are in the "wrong" orientation.