Why rational diversity?
Conventional protocols create variants of a gene by more-or-less random mutations. For example, one can introduce mutations by error-prone PCR. Here you have no control over position and nature of the mutations. You will create many silent mutations and many unwanted stop codons. You will also only be able to create a tiny fraction of all possible sequences, rendering the identification of the best variant very unlikely. Let's say you want to mutagenize a 300 nt sequence, coding for a 100 aa protein and you have created a library of 109 variants via error-prone PCR. Each clone shall contain 15 mismatches on average.
Because of the degeneration of the genetic code, only 76% of the mutated codons will code for a different aa, 5% will code for a stop codon. If, on average, 15 codons are mutated, the chance of creating a stop codon once is 54%. Half of your library is not even full length! Given a 100 aa protein and given variants with each one having exactly 15 altered aa, a total of 1049 variants are possible. Thus, a library of 109 variants covers only a tiny fraction of all possible specimen. This is the same ratio as one gram compared to the mass of five million suns.
In order to reduce your haystack, it is helpful if you have information (in particular structural data) about the protein positions that are likely to participate in the proteins performance (e.g. interaction, catalysis, heat stability...). Then permutate just these positions and maybe only alow limited substitutions at different positions (e.g. just polar, acidic, basic aa). With this strategy you quickly come down to manageable library sizes that are far more likely to screen successfull. These types of libraries can only be generated synthetically.
Another approach to create libraries is allele shuffling by PCR recombination. Here you have to provide the physical DNA of the homologous sequences. The identity needs to be high enough for recombination. Initial fragmentation of the DNA is random and the nature of the procedure renders it unlikely that two adjacent SNPs will ever be separated and recombined. Here also, synthetic libraries are the solution. There is no need for any physical DNA of the alleles and the frequency of shuffling of two adjacent SNPs is as good as it is for distant ones.



