Endogenous pararetrovirus sequences associated with 24 nt small RNAs at the centromeres of Fritillaria imperialis L. (Liliaceae), a species with a giant genome
Endogenous pararetroviral sequences are the most commonly found virus sequences integrated into angiosperm genomes. We describe an endogenous pararetrovirus (EPRV) repeat in Fritillaria imperialis, a species that is under study as a result of its exceptionally large genome (1C=42096Mbp, approximately 240 times bigger than Arabidopsis thaliana). The repeat (FriEPRV) was identified from Illumina reads using the RepeatExplorer pipeline, and exists in a complex genomic organization at the centromere of most, or all, chromosomes. The repeat was reconstructed into three consensus sequences that formed three interconnected loops, one of which carries sequence motifs expected of an EPRV (including the gag and pol domains). FriEPRV shows sequence similarity to members of the Caulimoviridae pararetrovirus family, with phylogenetic analysis indicating a close relationship to Petuvirus. It is possible that no complete EPRV sequence exists, although our data suggest an abundance that exceeds the genome size of Arabidopsis. Analysis of single nucleotide polymorphisms revealed elevated levels of CT and GA transitions, consistent with deamination of methylated cytosine. Bisulphite sequencing revealed high levels of methylation at CG and CHG motifs (up to 100%), and 15-20% methylation, on average, at CHH motifs. FriEPRV's centromeric location may suggest targeted insertion, perhaps associated with meiotic drive. We observed an abundance of 24nt small RNAs that specifically target FriEPRV, potentially providing a signature of RNA-dependent DNA methylation. Such signatures of epigenetic regulation suggest that the huge genome of F.imperialis has not arisen as a consequence of a catastrophic breakdown in the regulation of repeat amplification.