Transposition of IS10 from the host Escherichia coli genome to a plasmid may lead to cloning artefacts

Publikace: MOLECULAR GENETICS AND GENOMICS 266, 216-222 Autoři: Kovarik, A., Matzke, MA., Matzke, AJM., Koukalova, B. Rok: 2001


During recloning of Nicotiana tabacum L. repetitive sequence R8.3 in Escherichia coli, a modified clone that dithered from the original by the insertion of an IS10 sequence was unintentionally produced. The insert was flanked by a 9-by direct repeat derived from the R8.3 sequence, the 9-by duplication of acceptor DNA in the site of insertion being a characteristic of IS10 transposition events. A database search using the FASTA program showed IS10 and other prokaryotic IS elements inserted into numerous eukaryotic clones. Unexpectedly, the IS10, which is not a natural component of the E. coli genome, appeared to be by far the most frequent contaminant of DNA databases among several IS sequences tested. In the GenEMBL database, the IS10 query sequence yielded positive scores with more than 500 eukaryotic clones. Insertions of shortened IS10 sequences having only one intact terminal inverted repeat were commonly found. Most full-length IS10 insertions (32 out of 40 analyzed) were flanked by 9-by direct repeats having the consensus 5'-NPuCNN-NGPyN-3' with a strong preference for 5'-TGCTNA-GNN-3'. One insertion was flanked by an inverted repeat of more than 400 by in length. PCR amplification and Southern analysis revealed the presence of IS10 sequences in E. coli strains commonly used for DNA cloning, including some reported to be Tn10-free. No IS10-specific PCR product was obtained with N. tabacum or human DNA. Our data suggest that transposition of IS10 elements may accompany cloning steps, particularly into large BAC vectors. This might lead to the relatively frequent contamination of DNA databases by this bacterial sequence. It is estimated that one in approximately every thousand eukaryotic clone in the databases is contaminated by IS-derived sequences. We recommend checking submitted sequences for the presence of IS10 and other IS elements. In addition, DNA databases should be corrected by removing contaminating IS sequences.