Studies indicate that in coding regions there is a predominance of SSRs with gene motifs of the tri- and hexanucleotide type, the result of selection pressure against mutations that alter the reading frame Zhang et al. Bioinformatics DNA Res In humans, the consensus is that SSRs can also originate in coding regions, leading to the appearance of repetitive patterns in protein sequences.
In protein sequence database studies, it was reported that tandem repeats are common in many proteins, and the mechanisms involved in their genesis may contribute to the rapid evolution of proteins Katti et al.
Protein Sci J Mol Evol Repeat polymorphisms usually result from the addition or deletion of the entire repeat units or motifs.
Therefore, different individuals exhibit variations as differences in repeat numbers. In other words, the polymorphisms observed in SSRs are the result of differences in the number of repeats of the motif caused by polymerase strand-slippage in DNA replication or by recombination errors. Strand-slippage replication is a DNA replication error in which the template and nascent strands are mismatched. This means that the template strand can loop out, causing contraction.
The nascent strand can also loop out, leading to repeat expansion. Recombination events, such as unequal crossing over and gene conversion, may additionally lead to SSR sequence contractions and expansions.
According to several authors, the longer and purer the repeat, the higher the mutation frequency, whereas shorter repeats with lower purity have a lower mutation frequency. As for their composition, SSRs can be classified according to motif as: i perfect if composed entirely of repeats of a single motif; ii imperfect if a base pair not belonging to the motif occurs between repeats; iii interrupted if a sequence of a few base pairs is inserted into the motif; or iv composite if formed by multiple, adjacent, repetitive motifs reviewed in Oliveira et al.
Genet Mol Biol In: Batley J ed Plant Genotyping. Springer, New York, NY, pp SSRs have been the most widely used markers for genotyping plants over the past 20 years because they are highly informative, codominant, multi-allele genetic markers that are experimentally reproducible and transferable among related species Mason, Mason AS SSR Genotyping.
In particular, SSRs are useful for wild species i in studies of diversity measured on the basis of genetic distance; ii to estimate gene flow and crossing over rates; and iii in evolutionary studies, above all to infer infraspecific genetic relations. On the other hand, for cultivated plants SSRs are commonly used for i constructing linkage maps; ii mapping loci involved in quantitative traits QTL ; iii estimating the degree of kinship between genotypes; iv using marker-assisted selection; and v defining cultivar DNA fingerprints Jonah et al.
Euphytica SSRs have been particularly useful for generating integrated maps for plant species in which full-sib families are used for constructing linkage maps Garcia et al. Ann Appl Biol — Int J Plant Genomics Springer, New York, NY, pp for review articles.
These markers are enormously useful in studies of population structure, genetic mapping, and evolutionary processes. SSRs with core repeats 3 to 5 nucleotides long are preferred in forensics and parentage analysis. Nucleic Acids Res Mol Breed Despite the wide applicability of SSRs as genetic markers since their discovery in the s, little is known about the biological importance of microsatellites Tautz and Renz, Tautz D and Renz M Simple sequences are ubiquitous repetitive components of eukaryotic genomes.
Morgante et al. Nat Genet Interestingly, there are substantial data indicating that SSR expansions or contractions in protein-coding regions can lead to a gain or loss of gene function via frameshift mutation or expanded toxic mRNAs.
SSR variations in 5'-UTRs could regulate gene expression by affecting transcription and translation, but expansions in the 3'-UTRs cause transcription slippage and produce expanded mRNA, which can disrupt splicing and may disrupt other cellular functions.
All these effects can eventually lead to phenotypic changes Li et al. Mol Biol Evol Cell Death Dis 4:e In fact, variation in the length of DNA triplet repeats has been linked to phenotypic variability in microbes and to several human disorders, including Huntington's disease which is caused mainly by CAG n expansions. Moreover, the frequencies of different codon repeats vary considerably depending on the type of encoded amino acid. In plants, a triplet repeat-associated genetic defect was identified in a wild variety of A.
Expansion of the repeat causes an environment-dependent reduction in the enzyme's activity and severely impairs plant growth, whereas contraction of the expanded repeat can reverse the detrimental effect on the phenotype Sureshkumar et al. Science Historically, tandem repeats have been designated as nonfunctional DNA, mainly because they are highly unstable. With the exception of tandem repeats involved in human neurodegenerative diseases, repeat variation was often believed to be neutral with no phenotypic consequences see Gemayel et al.
The detection of microsatellites in transcripts and regulatory regions of the genome encouraged scientific interest in discovering their possible biological functions. More and more publications have presented evidence that microsatellites play a role in relevant processes, such as the regulation of transcription and translation, organization of chromatin, genome size and the cell cycle Nevo, Nevo E Evolution of genome-phenome diversity under environmental stress.
Funct Plant Biol As mentioned above, most of the knowledge acquired on microsatellites occurring in genes was obtained by studying humans and animals, indicating their relationship with the manifestation of disease. In bacteria, maintaining numerous microsatellite variants provides a source of highly mutable sequences that enable prompt generation of novel variations, ensuring the survival of the bacterial population in widely varying environments, and adaptation to pathogenesis and virulence.
Nevertheless, few studies have focused on whether the typical instability of microsatellites is linked to phenotypic effects in plants Li et al. However, thanks to whole genome sequencing the important role repeats might play in genomes is being elucidated.
The consensus is that the biological function of a microsatellite is related to its position in the genome. For instance, SSRs in 5'-UTRs serve as protein binding sites, thereby regulating gene translation and protein component and function, as classically demonstrated for the human genes for thymidylate synthase Horie et al.
Cell Struct Funct Ten years later, SSR densities in different regions 5'-UTRs, introns, coding exons, 3'-UTRs, and upstream regions in housekeeping and tissue-specific genes in human and mouse were compared. Additionally, it was suggested that SSRs may have an effect on gene expression and may play an important role in contributing to the different expression profiles of housekeeping and tissue-specific genes Lawson and Zhang, Lawson MJ and Zhang L Housekeeping and tissue-specific genes differ in simple sequence repeats in the 5'-UTR region.
Gene FEBS Lett BMC Plant Biol e1. G3 Genes Genomes Genet Additionally, tri- and hexanucleotide coding repeats appear to be controlled by stronger mutation pressure in coding regions than in other gene regions. The biased distribution of microsatellites and microsatellite motifs also suggests that microsatellites of different types play different roles in different gene regions, such as within promoters, introns and exons in plants Li et al.
Remarkably were the results relative to the function of microsatellite-containing transcripts. After an enrichment analysis, four pathways, i. Sci Rep Microsatellites located in introns can play a role in the transport and alternative splicing of mRNA and in gene silencing, as well as in the regulation of transcription, acting independently or in combination with SSRs present in 5'-UTR regions Kalia et al.
A number of examples of the effects of intronic SSRs in humans were reviewed by Li et al. The 3'-UTR region is also subject to alterations due to the presence of SSRs which cause slippage during the transcription or modification of target regions whose translation is controlled by miRNAs Li et al.
An example of the effect of polymerase slippage in 3'-UTR regions is the multisystem disorder myotonic dystrophy type 1, caused by expansion of a CTG trinucleotide repeat. Curr Opin Genet Dev Finally, microsatellites are known to affect expression if present in gene promoters and intergenic regions. In the promotor, SSRs render gene expression vulnerable to possible alterations caused by expansion or contraction of repeat sequences.
These alterations result in an increase or reduction in the level of gene expression caused by changes in transcription factor linkage sites and can even culminate in gene silencing. Tandem repeats in intergenic regions can cause changes in the secondary structure of the DNA by forming loops and altering the chromatin, which indirectly results in alterations in the expression of nearby genes Gao et al. In spite of the scarcity of studies on the functional changes brought about by SSRs in plants, their effects are believed to be similar to those found in humans.
For instance, the occurrence of trinucleotide repeats in Arabidopsis genome was found to be twice as frequent in coding regions, suggesting selection for certain stretches of amino acids Morgante et al.
Using data generated in our laboratory, we have compared the percentage of SSRs having mono-, di-, tri, tetra-, penta and hexanucleotide motifs in expressed sequences, gene-rich regions, BAC-end sequences and chloroplast genome sequences of Passiflora edulis , and identified the prevalent motif in each case.
We also noticed the prevalence of tri- and hexanucleotide motifs in expressed sequences Figure 1. Recently, based on the genomes available in the Phytozome database, Zhao et al. The majority were identified in open reading frames, indicating a possible effect on the gene product and consequently on gene function. An important example of the functioning of SSRs in plants was reported by Liu et al.
BMC Genomics Another important aspect is the instability of microsatellites. Studies conducted on transgenic plants of A. Plant Physiol This peculiarity means that SSR markers can be used to assess the impacts of mutagenic contaminants. Mutagenesis induced in Pisum sativum by high doses of lead was detected based on the instability of microsatellites at a locus involved in metabolizing glutamine Rodriguez et al.
Plant Physiol Biochem Microsatellite alterations associated with diseases in humans are widely known and can give the false impression that the effects of these mutations are predominantly adverse. On the contrary, some examples provide evidence that SSR alleles can offer potential selective advantages Kashi and King, Kashi Y and King D Simple sequence repeats as advantageous mutators in evolution.
Trends Genet SSRs are currently qualified as relevant to population adaptation and phenotypic plasticity within and across generations and gene-associated tandem repeats act as evolutionary facilitators, providing abundant, robust variation and thus enabling rapid development of new forms Nevo, Nevo E Evolution of genome-phenome diversity under environmental stress.
The development of SSR markers can basically be divided into the following stages: i prior knowledge of nucleotide sequences in which SSRs occur; ii design of oligonucleotides or primers complementary to the regions flanking the SSR; iii validation of primers by PCR and electrophoresis of the product of the reaction, and iv detection of polymorphisms among individuals Mason, Mason AS SSR Genotyping.
A schematic workflow showing how an SSR marker can be obtained is given in Figure 2. Interestingly, the efficiency of SSR marker development was found to be associated with the microsatellite class.
In rice, for instance, the rate of successful amplification varied from Microsatellites were originally developed from both coding and non-coding regions of plant genomes, and several sources were used to search for SSRs, including a variety of DNA libraries genomic, genomic-enriched for SSR, bacterial artificial chromosome and cDNA libraries , as well as public databases, including expressed sequence tag EST databases see Hanai et al.
Genome In prospecting for SSRs, the first step consists of constructing enriched genomic libraries and various enrichment methods have been successfully developed Billotte et al.
Fruits Appl Plant Sci To construct and sequence genomic libraries, the DNA is fragmented, ligated to adaptors and inserted into vectors for transforming Escherichia coli. Most protocols involve a stage of enrichment for repetitive sequences that can be achieved using selective hybridization, PCR or both techniques Senan et al.
Not Sci Biol In enrichment by hybridization, positive clones are detected using radioactively or chemically labeled SSR probes. Finally, these clones are selected by PCR amplification and sequencing Semagn et al.
Afr J Biotechnol Another way of enriching a library is to use biotinized SSR probes that are captured by streptavidin-coated beads Nunome et al. The captured DNA is eluted, amplified, cloned and sequenced. The enriched libraries are screened to identify clones containing SSRs, producing the subsample of repetitive sequences that is intrinsic to this approach. PCR-based methods can bias the sampling of repetitive sequences in non-enriched libraries, since fragment selection and amplification are dependent on complementarity with specific primers for the SSR and cloning vector.
However, non-enriched libraries and alternative methods derived from other molecular markers e. Various NGS-based projects have been developed over the last few decades, generating an enormous quantity of sequences made available in public databases and widely used for prospecting for microsatellites. J Mol Biol Nature However, because of the high cost of the Sanger method when sequencing complete genomes, it has been replaced by NGS platforms or a combination of both methods Schnable et al.
Third generation platforms are also currently available, including a platform developed by Pacific Biosciences PacBio , based on a new sequencing technology, SMRT sequencing, which has the advantage of producing longer DNA reads. Each platform has specific characteristics in terms of the number and size of reads generated, run time, as well as the accuracy and cost of each base read, with both advantages and disadvantages compared to other platforms Egan et al.
Am J Bot In order to advice researchers in sequencing technology choice, Alic et al. Control error analysis is one of the most important steps in sequencing data analysis, mainly in de novo sequencing projects, that lack a reference genome.
Furthermore, sequences that contain repetitive regions are challenges to be overcome by error correction methods, due to their vulnerability to errors. Initiatives for sequencing the complete genomes of various species use combinations of different platforms with the aim of incorporating the best features of each and extracting the maximum amount of information.
However, the PacBio SMRT sequencing technology is being considered an economically viable alternative for discovering microsatellites Grohme et al. BioTechniques With the advent of NGS, it was necessary to create databases for storing the information generated. The online database platforms for nucleotide, protein and transcript data available for the majority of plant species are relatively small when compared to model species, such as A. Since the protocols for obtaining and isolating de novo SSR loci can be expensive and not viable in some cases, the investigation of these elements in silico i.
This approach is possible only because SSR loci primers are transferable among different, phylogenetically matching species Kuleung et al. The possibility of interchanging this genetic information is ascribed to the synteny between matching species. Mol Genet Genomics The conservation of this information could indicate that these loci confer evolutionary advantages, and are therefore subject to low selection pressure Zhu, Zhu H Bridging model and crop legumes through comparative genomics.
Microsatellites found in the chloroplast genome of higher plants cpSSRs consist basically of mononucleotide repeats A and T Bryan et al. In terms of transferability, cpSSRs are particularly promising for the study of phylogenetically distant species, since the regions flanking them are strongly conserved, so that universal primers can be developed Weising and Gardner, Weising K and Gardner RC A set of conserved PCR primers for the analysis of simple sequence repeat polymorphisms in chloroplast genomes of dicotyledonous angiosperms.
Mol Ecol Resour After identifying the sequences containing SSRs, specific primers must be synthesized 18 and 25 bp in length , complementary to the flanking regions, followed by amplification and polymorphism testing. According to Guichoux et al. These authors itemized possible solutions for aiding researchers to solve these problems, such as stuttering or shadow bands, non-template addition of a nucleotide by the Taq polymerase, primer mispriming, etc.
Once the SSR markers have been produced, genotyping can begin. It is a relatively easy and low-cost procedure. The allele variants of a given SSR locus can be identified by agarose gel electrophoresis AGE or polyacrylamide gel electrophoresis PAGE , low-complexity methods used routinely in molecular genetics laboratories.
PAGE genotyping is more labor intensive but provides better resolution, allowing identification of given polymorphisms for a single base pair Penha et al. Plant Breed J Hered In this case, each DNA sample is loaded into a capillary containing a polyacrylamide matrix in which the electrophoresis is performed.
The fluorescence emitted by the marked primer is captured and the molecular mass of the amplified fragment is determined. The result is an electropherogram showing luminescence peaks corresponding to each amplified allele. Lastly, the genotyping stage consists of comparing the electropherograms of different individuals see Culley et al. Plant Sci. The most appropriate genotyping method for each project is defined according to the species under investigation, the sensitivity required in determining allele variations, the availability of the equipment and cost effectiveness.
The amplification and genotyping stages can be perfected to multiplex different SSR loci, cutting costs and saving time, and allowing large scale analysis Brown et al. There are two ways of performing multiplexed analysis of microsatellite loci. The following stages are essential: i determining the length in bp of the alleles at each SSR locus; ii selecting loci whose allele lengths are not superimposed; iii in silico testing at melting temperature T m and the possible formation of secondary structures between the primers of the SSR loci selected.
The second multiplexed SSR loci analysis method entails multiplexed genotyping. In this case, amplifications are performed separately, but the amplified products of a biological sample are mixed and loaded into the same electrophoresis gel channel or sequencing capillary.
Guichoux et al. Several aspects are reviewed, including the overall cost of SSR genotyping as a function of the degree of multiplexing and the number of genotyped samples. For instance, the most widely cited commercial kit has a cost per sample of 1.
The authors then suggest solutions to cut the final cost per sample. According to these authors, most of the work done to develop and optimize SSR multiplexing actually consists of phases common to all SSR development projects. In the past, alternative methods have been developed to facilitate genotyped PCR multiplexing by capillary electrophoresis, such as the M13 tailed primer method Oetting et al.
Genomics In this method, the sequencing reaction is performed as a multiplexed PCR using the M13 reverse primer, conjugated with a fluorescent colorant and various modified SSR forward primers. The SSR primers are modified by a bp extension at the 5' end, identical to the M13 nucleotide sequence. In the first PCR cycle, amplification is based on the SSR primers, forming an M13 annealing site at the 3' end, used in the second amplification cycle.
A variant of this technique Multiplex-Ready PCR was subsequently published with the aim of cutting the cost of primer marking, which is usually 5 to 10 times that of conventional primer synthesis Hayden et al. Microsatellite genomic distribution, biological function and practical utility have been reviewed in a number of articles over the past two decades, some of which are highlighted here: Jarne and Lagoda Jarne P and Lagoda PJ Microsatellites, from molecules to populations and back.
Trends Ecol Evol Curr Biol 8:RR Mol Ecol Bioessays With the aim of investigating the use of microsatellite markers over the period from to in the genetic analysis of cultivated plants, we conducted a search in the main database of Web of Science Web of Science TM Core Collection. Finally, the search was refined by selecting the field of Plant Science, and all resulting hits were manually checked. We found unique records Figure 3 , Supplementary Material Table S1 showing that microsatellites continue to be used as high-relevance molecular markers in the genetic analysis of cultivated plants.
The number of publications rose steadily until , and then fell back, possibly due to the ease with which genetic studies could be carried out using SNPs. Chromosome-wise distribution of di, tri, tetra, penta, and hexanucleotides repeat SSR loci in the spinach genome. Primer pairs were successfully designed for 35, of the total SSR loci and were pursued further in the study.
Similarly, SSR loci were near located less than bp and removed. Next, a bed format file containing the coordinate of all 19, selected SSR loci was prepared. The whole-genome sequences of 21 spinach accessions were mapped to the chromosome sequences of the reference genome, and aligned bam files were generated for each spinach accession.
Using the HipSTR program, an in silico genotyping was performed using the aligned bam files of 21 spinach accessions, bed files containing coordinates of 19, SSR loci, and reference chromosome sequences.
Of these, loci were either monomorphic or had missing calls in all accessions and were not pursued. Furthermore, SSR loci were discarded as these contain non-reference alleles on less than two accessions. The remaining SSRs showed non-reference alleles in more than two accessions among the genome sequences of 21 spinach accessions that were retained as the set of polymorphic SSR loci.
The HipSTR alignment of the genome sequences, in silico genotyping, and allele calling approach used in this study is shown in Fig. The physical map showing the distribution of the polymorphic SSR loci in the spinach genome was drawn using unique colors for each SSR motif length Fig. Detailed information of these in silico identified polymorphic SSR loci is provided in the Supplementary Table S2 , which includes a physical position on the reference genome, reference SSR motifs, repeat numbers, and primer pairs for PCR amplification.
In addition, bp flanking sequences on both sides of the SSR loci and the number of alleles identified following in silico genotyping in the set of 21 spinach accessions is provided Supplementary Table S2. Of the polymorphic loci identified in this study, the di- and tri-repeat loci were more abundant, while the other tetra, penta, and hexa were relatively lower in number Table 2. Chromosomes 3 and 4 were the longest and harbored more SSRs, and as expected, they contain a higher number of polymorphic SSRs.
The polymorphic SSRs identified here were evenly distributed throughout the six chromosomes, although some regions had higher densities and gaps Fig. A search of the physical location of all polymorphic SSRs for an overlap with the genes on the reference genome GFF files found Genome sequences of multiple accessions were aligned to the reference genome and variation in the number of repeat units for the SSR repeat motif 'ATG' among the accessions were recorded.
The reference sequence containing SSR loci is displayed on the top row while the allele sizes of each accession are displayed on the second row and the aligned reads of each accession are presented on succeeding rows. Physical map location and distribution of in silico identified SSR markers from the genome sequence of spinach cultivar Sp Thirty-six SSR loci were randomly selected from the set of in silico identified polymorphic SSR loci and genotyped using molecular assay on a panel of 48 spinach accessions to confirm and validate the polymorphism potential of the in silico identified SSR loci Table 3.
The 36 SSRs were distributed evenly across all six spinach chromosomes, having 1 di, 23 tri, 4 tetra, 2 penta, and 6 hexanucleotide repeats. Primers were redesigned for some loci to fit multiplexing during molecular validation. Of the 36 SSR loci, 34 loci gave clear polymorphic band profiles among the spinach panel in this study, while two primer pairs did not amplify Table 3. The amplification success rate of the primer pairs used to amplify in these accessions set was A total of alleles were scored at these 34 SSR loci ranging from 2 to 5 alleles per locus and included an average of 2.
Expected heterozygosity or gene diversity H e ranged from 0. S1 , suggesting the spinach accessions used in this study underlies genetic differentiation of three main populations. We considered reporting the three population groups to assign the spinach accessions genotyped with the SSR markers in this study.
A membership probability cutoff Q value of 0. The classification of spinach accessions into the population groups are provided in Supplementary Table S1. The Q1 group comprised 17 accessions Fifteen accessions Accessions from China, Afghanistan, Pakistan, and Greece merge in the Q2 population, although these accessions also contain a significant ancestry proportion from the Q3 population. The Q3 group comprises 11 accessions In addition, a few S. The remaining five accessions Of the admixed population, S.
Population structure analysis classified 48 spinach accessions into three population groups based on delta K analysis. The distribution of spinach accessions to the three population clusters Q1, Q2, and Q3 was colored red, green, and blue. The accession name and country of origin are denoted on the x-axis, while the y-axis represents the membership proportion of accession to different population groups.
The spinach accessions differentiated into three main clusters in the PCA plot Fig. The first two principal components, PC1, and PC2 explained Cluster 1 contains the Q1 accessions, cluster 2 contains Q2 accessions, cluster 3 contains Q3 accessions, and the admixed accessions are colored with cyan color are in the center of the three clusters. The same color code was used for accessions belonging to Q1, Q2, and Q3, and cyan color for the admixed Qm population.
The genetic diversity in the spinach panel was analyzed using the maximum likelihood method based on the Tamura—Nei model 45 in MEGA 7 The phylogenetic analysis of 48 spinach accessions with the newly developed genome-wide 34 SSR markers showed three separate spinach accession clusters with some overlaps Fig.
The accessions from South and East Asia Q3 group form a separate cluster, close to the West Asian accessions, and were distant from the European and United States accessions and the differential cultivars.
In contrast, the S. Overall, the diversity analysis indicates three well-differentiated populations in the worldwide spinach panel, and the accessions were consistent in the neighbor-joining tree, STRUCTURE, and PCA results despite some overlaps and mismatches Figs.
Genetic grouping of spinach accessions based on SSR markers corresponded well with their geographical origin, domestication history, and pedigree. And the genetic clustering of accessions was similar to previous genetic diversity studies 26 , The advancement of sequencing technologies and the availability of genome sequences for many commercial and specialty crops have eased the discovery of SSR markers in the previous decade.
Previously SSR markers were discovered from the sequencing of genomic libraries enriched for SSRs and were not efficient in terms of cost, time, and technical resources 8 , 48 , The availability of genomic and transcriptomic sequences has increased the rate and efficiency of developing SSR markers, and SSRs have been developed in the model, non-model, and orphan crops 50 , 51 , However, many of the recent SSR development studies, particularly using the whole-genome and transcriptome sequences, tested a small number generally between 20 and of randomly selected SSR loci to report a few useful polymorphic markers.
The challenge remained in screening and identifying polymorphic loci with variable repeat length across the panel. Here, we present a new approach to discovering polymorphic SSR markers by aligning and comparing the genome sequences of multiple accessions against the reference genome to count and genotype SSR loci variation in repeat length.
A subset of computationally identified polymorphic SSR loci using the HipSTR program 32 was validated in a set of 48 diverse spinach germplasm accessions to confirm polymorphism in PCR based molecular assay.
This study efficiently discovered a large set of genome-wide SSR markers with known physical map locations across the six spinach chromosome. The SSR distribution in the Sp75 genome was previously reported 20 and is not described in detail here. This study aimed to generate a large set of polymorphic SSR loci to extend their use in future genetic studies in spinach.
Initially, primer pairs were only designed for 35, loci, while primers were not designed for SSR loci due to lack of flanking sequences and missing sequences in the template sequences to design primers Table 2. The mononucleotides are error-prone in scoring and were not pursued in this study, while the dinucleotide repeats are high in stuttering 53 , In contrast, the tri and higher nucleotide containing motif are easy to score as are less prone to amplification errors and stuttering, although the higher repeat SSRs were present in a low frequency in spinach.
Hence, the SSRs with higher repeat numbers tri, tetra, penta, and hexa are recommended for future genetic studies. Forty-eight spinach accessions from two Spinacia species, S. The two primer pairs do not amplify in the spinach panel and most likely because of the addition of the M16 tail that interfered with amplification by producing secondary structures and changes in annealing temperatures.
Genotype results from the molecular assay for the remaining SSR loci completely corresponded to the computation generated genotype profile as all 34 markers showed two or more bands across the panel. The result indicates most of the in silico derived SSRs reported in this study are truly polymorphic.
Primer sequences were designed for all SSR loci, and flanking sequences for all loci have been provided Supplementary Table S2. The flanking sequences can be used to redesign primers with different product sizes to fit in the multiplex runs.
The two primer pairs do not amplify in this study but could be amplified with different sets of primers but were not tested here.
The majority of the polymorphic SSRs reported here were positioned in the intergenic regions, with The high percentage of genic markers plus the gene and gene functions reported here may help in breeding and physiological studies. A large set of SSR markers identified in this study will support genetic, genomic studies in spinach, mainly in fingerprinting, genetic diversity analysis, marker-trait linkage and association analysis, and molecular breeding.
This approach of discovering polymorphic SSR markers is resource-efficient both in terms of time and cost. On top of that, the method reported in this study is transferable to other species with available genome sequences and resequences data.
The relatively lower number of alleles range of 2—5 with an average of 2. The average PIC value of the markers in this study was 0. The use of fluorescently labeled primers in the multiplex set and fragment analysis in capillary electrophoresis can increase allele calling at high precision with clearer results and eliminate the difficulties with scoring bands in the agarose and polyacrylamide gel electrophoresis method.
Hence, we expect and recommend using semi-throughput capillary electrophoresis methods for fragment sizing to generate clearer allele sizes and record a higher number of alleles. The number of alleles genotyped by the HipSTR program among the genome sequences of 21 accessions and the reference genome is provided Supplementary Table S2 , giving us an idea of the expected number of alleles while designing future experiments to use these markers.
Fingerprint data generated from PCR genotyping was used to assess the genetic diversity and population structure of the spinach accessions. The population structure and phylogeny assignment performed with allele scores of 34 SSR markers in this study distinguished the worldwide spinach germplasm accessions and the population group assignments were consistent with previous reports 26 , The clusters generated by structure analysis, neighbor-joining analysis, and PCA were similar.
The accessions assigned to the population group largely correspond to the geographical origin for most accessions, with a few mismatches and overlaps. Few differential cultivars Meerkat, Lazio, and Pigeon were assigned to the Q3 group along with the Southern and East Asian accessions in Structure analysis.
However, the differentials were merged with the Q1 group in PCA and phylogenetic analysis. But a recent diversity analysis study 26 generated similar results where the South Asian accessions formed multiple clusters, and some were close to accessions from Western Asia.
The phylogenetic trees clustered the two S. Previous studies reported S. Successful amplification of accessions belonging to two Spinacia species S. Efforts to identify genes for major traits are prioritized in spinach. Regardless of the reduced cost of whole-genome sequencing and reduced representation sequencing GBS and RADseq , genotyping small sets of SSR markers are economical and easy to manage 58 and are helpful to make the framework genetic studies.
Downy mildew is the most important disease that devastated all major spinach production areas, specifically the California valleys, the major spinach production area in the United States. Previous genetic mapping efforts have attempted to map the trait locus but have been limited with the availability of a dense set of molecular markers resulting in a lower resolution of the trait locus. The availability of physically mapped markers can help expedite fine-mapping research and studying the genetic control of the trait at a finer resolution.
Abundant SSRs identified in this study may be used in the beginning to map the targeted trait locus via association or QTL mapping, followed by SNP markers to narrow the locus interval. For fine mapping and in-depth analysis, targeted sequencing methods as amplicon sequencing 59 , 60 and hybridization-target enrichment 61 , 62 can be employed to generate sequence data of targeted regions at high coverage to gain insights on the genetic basis of trait control. Markers including SSRs in the proximal end of spinach chromosome 3 are promising as the known downy mildew resistance locus known as RPF maps in the region.
The RPF loci have been mapped to the 0. These new SSR markers may also help develop near isogenic lines NIL to track the resistance gene introgressed region of the recurrent susceptible lines. Furthermore, the genome of some races of spinach downy mildew pathogen Peronospora effusa race 1, 12, 13, 14 has been sequenced 67 , 68 , Such SSR panels could be used routinely in functional diversity analysis and marker profile highly variant and continually emerging new pathogen races.
The same approach could be extended for other diseases in spinach and examine and understand host—pathogen interactions in other crops.
These new sequences generated for spinach will facilitate the identification of more SSR loci and map them across longer chromosome lengths. This study utilized the available reference spinach genome sequences to mine SSR loci along with the genome sequences of additional accessions to develop a large set of polymorphic SSR markers following computational screening for the variation in the number of repeat units among the accessions.
Substantiate polymorphism observed following molecular validation of randomly selected SSR loci demonstrated our strategy to identify new polymorphic SSR markers. The development of a large set of polymorphic SSR markers in this study will support genetic research in spinach, especially for the labs with limited resources.
A dense set of relatively easy and inexpensive to use SSR markers reported in this study may facilitate fingerprinting, genetic diversity, phylogeny assignment, population structure analysis, and mapping and molecular breeding effort in spinach.
Notably, the polymorphic SSR markers can be employed to investigate genetic diversity and population structure among the wild and cultivated spinach accessions and to identify duplicates and generate a core set of diverse accessions.
Indeed, these markers can be equally valuable for breeding applications to investigate and map the major and minor traits. Most importantly, our approach of identifying polymorphic SSRs will expedite the development of useful markers with the known physical location and avoids laborious preliminary molecular screening for polymorphism. Morelock, T. In Vegetables I — Springer, Chapter Google Scholar.
Cao, G. Antioxidant capacity of tea and common vegetables. Food Chem. Howard, L. Antioxidant capacity and phenolic content of spinach as affected by genetics and growing season. Vegetables summary.
Zhao, J. Genetic variation and association mapping of seed-related traits in cultivated peanut Arachis hypogaea L. Plant Sci. Google Scholar. Gyawali, S. Microsatellite markers used for genome-wide association mapping of partial resistance to Sclerotinia sclerotiorum in a world collection of Brassica napus. Sugita, T. Development of simple sequence repeat markers and construction of a high-density linkage map of Capsicum annuum.
Zalapa, J. Using next-generation sequencing approaches to isolate simple sequence repeat SSR loci in the plant sciences. Biswas, M. Transcriptome wide SSR discovery cross-taxa transferability and development of marker database for studying genetic diversity population structure of Lilium species. Cheng, J. A comprehensive characterization of simple sequence repeats in pepper genomes provides valuable resources for marker development in Capsicum.
Kalyana Babu, B. Development and validation of whole genome-wide and genic microsatellite markers in oil palm Elaeis guineensis Jacq. Bhattarai, G. In silico development and characterization of tri-nucleotide simple sequence repeat markers in hazelnut Corylus avellana L. PLoS One 12 , Engelbrecht, J. New microsatellite markers for population studies of Phytophthora cinnamomi , an important global pathogen.
Parada-Rojas, C. Analysis of microsatellites from transcriptome sequences of Phytophthora capsici and applications for population studies. Cai, G. Comparative genomics approach to build a genome-wide database of high-quality, informative microsatellite markers: Application on Phytophthora sojae , a soybean pathogen.
Khattak, J. Genic microsatellite markers for discrimination of spinach cultivars. Plant Breed. Kuwahara, K. An analysis of genetic differentiation and geographical variation of spinach germplasm using SSR markers. Plant Genet. Article Google Scholar. Feng, C. Construction of a spinach bacterial artificial chromosome BAC library as a resource for gene identification and marker development.
Plant Mol. Newly developed SSR markers reveal genetic diversity and geographical clustering in spinach Spinacia oleracea.
Li, S. Genome-wide characterization of microsatellites and genetic diversity assessment of spinach in the chinese germplasm collection. Rubatzky, V. Spinach, table beets, and other vegetable chenopods. World Veg. Ribera, A. A review on the genetic resources, domestication and breeding history of spinach Spinacia oleracea L. Euphytica , 20 Andersen, S. Wild Crop Relat. Acquisition and regeneration of Spinacia turkestanica Iljin and S.
Crop Evol. Sneep, J. The domestication of spinach and the breeding history of its varieties. Euphytica Suppl. On the origin and dispersal of cultivated spinach Spinacia oleracea L. Xu, C. Draft genome of spinach and transcriptome diversity of Spinacia accessions. Riangwong, K. Mining and validation of novel genotyping-by-sequencing GBS -based simple sequence repeats SSRs and their application for the estimation of the genetic diversity and population structure of coconuts Cocos nucifera L.
Gymrek, M. Genome Res. Highnam, G. Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles. Nucleic Acids Res. Cao, M. Inferring short tandem repeat variation from paired-end short reads. Willems, T. Genome-wide profiling of heritable and de novo STR variations. Methods 14 , — Wang, X. Li, H. Bioinformatics 25 , — Untergasser, A. Primer3-new capabilities and interfaces.
Agronomic and molecular characterization of wild germplasm Saccharum spontaneum for sugarcane and energycane breeding purposes. Schuelke, M. An economic method for the fluorescent labeling of PCR fragments. Peakall, R. GenALEx 6. Population genetic software for teaching and research-an update.
Bioinformatics 28 , — Liu, K. PowerMarker: An integrated analysis environment for genetic marker analysis. Bioinformatics 21 , — Pritchard, J. Inference of population structure using multilocus genotype data. Genetics , — Evanno, G. Earl, D. Bradbury, P.
Bioinformatics 23 , — Tamura, K. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees.
0コメント