This page presents the general inforamtion about this demonstration webserver.
How to use:
The use of this web server is intuitive. Users need to input query position frequency matrix (PFM) into the textbox, and choose the desired searching databases and parameters combination. Multiple PFMs could be queried at once by inputting multiple PFMs into the textbox. Several input formats are supported currently, including TRANFAC [1], JASPAR [2], motif alignment and consensus sequences. Please refer to the input format page for details.
Current the input parameters :

Number of returns per input query PFM.

k value: the length of the kmer used to construct the kmer frequency vector (KFV). Available options: 2,3,4,5

Distance metric: the distance measure used to calculate dissimilarity between two constructed KFVs. Available options: cosine angle, Euclidean distance, Pearson correlation coefficient, a modified KullbakLeibler discrepancy.
For demonstration purpose, three sample PFMs are provided and will be filled into the input textbox if the “load sample” button is clicked.The three example PFMs are all from JASPAR, including Dorsal_2, RORA, MYCMAX (MYCMAX will hit highly similar bHLH PFMs; Dorsal_2 searches similar PFMs with various gap lengths; RORA demonstrates the ability to search for twopartite motifs, direct repeat or palindromic, using monopartite query PFMs). The searching result paper is designed to have a layout similar to that of STAMP [3]. In the resulting page, returns of each query PFM are listed in separate cell. Both the input and hit PFMs are visualized as sequence logos [4]. In addition to the raw distance value between two PFMs, the associated pvalues are also listed.
pvalue calculation:
To calculate the empirical pvalue associated with each motifmotif comparison, 10,000 simulated matrices were randomly generated following the method described by Sandelin and Wasserman [5] using STAMP [3], based on the motif length distribution in JASPAR core 2008 [2]. Pairwise distance scores between each two of the 10,000 random matrices were computed, resulting in 4.9995*107 unique comparisons. The distribution of the resulting scores was used for the analysis of score significance. Given a distance score d, the associated pvalue is defined as the area under the distribution curve for distance smaller than d, which gives an approximation of the likelihood of obtaining a distance score ≤ d by chance.
