Web server for TFBS PFM database search using PFV algorithm

This web interface is to demonstrate the effectiveness of our PFV algorithm for PFM searching and retrieval
Paper is submitted to PLoS ONE for peer review

 

Documenation Page

This page presents the general inforamtion about this demonstration web-server.

How to use:

The use of this web server is intuitive. Users need to input query position frequency matrix (PFM) into the textbox, and choose the desired searching databases and parameters combination. Multiple PFMs could be queried at once by inputting multiple PFMs into the textbox. Several input formats are supported currently, including TRANFAC [1], JASPAR [2], motif alignment and consensus sequences. Please refer to the input format page for details.

Current the input parameters :

  • Number of returns per input query PFM.
  • k value: the length of the k-mer used to construct the k-mer frequency vector (KFV). Available options: 2,3,4,5
  • Distance metric: the distance measure used to calculate dissimilarity between two constructed KFVs. Available options: cosine angle, Euclidean distance, Pearson correlation coefficient, a modified Kullbak-Leibler discrepancy.

For demonstration purpose, three sample PFMs are provided and will be filled into the input textbox if the “load sample” button is clicked.The three example PFMs are all from JASPAR, including Dorsal_2, RORA, MYC-MAX (MYC-MAX will hit highly similar bHLH PFMs; Dorsal_2 searches similar PFMs with various gap lengths; RORA demonstrates the ability to search for two-partite motifs, direct repeat or palindromic, using mono-partite query PFMs). The searching result paper is designed to have a layout similar to that of STAMP [3]. In the resulting page, returns of each query PFM are listed in separate cell. Both the input and hit PFMs are visualized as sequence logos [4]. In addition to the raw distance value between two PFMs, the associated p-values are also listed.

p-value calculation:

To calculate the empirical p-value associated with each motif-motif comparison, 10,000 simulated matrices were randomly generated following the method described by Sandelin and Wasserman [5] using STAMP [3], based on the motif length distribution in JASPAR core 2008 [2]. Pair-wise distance scores between each two of the 10,000 random matrices were computed, resulting in 4.9995*107 unique comparisons. The distribution of the resulting scores was used for the analysis of score significance. Given a distance score d, the associated p-value is defined as the area under the distribution curve for distance smaller than d, which gives an approximation of the likelihood of obtaining a distance  score ≤ d by chance.

References :

  1. Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV et al: TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res 2003, 31(1):374-378.
  2. Bryne JC, Valen E, Tang MH, Marstrand T, Winther O, da Piedade I, Krogh A, Lenhard B, Sandelin A: JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res 2008, 36(Database issue):D102-106.
  3. Mahony S, Benos PV: STAMP: a web tool for exploring DNA-binding motif similarities. Nucleic Acids Res 2007, 35(Web Server issue):W253-258.
  4. Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res 2004, 14(6):1188-1190.
  5. Sandelin A, Wasserman WW: Constrained binding site diversity within families of transcription factors enhances pattern discovery bioinformatics. J Mol Biol 2004, 338(2):207-215.

 

Web server maintained by
Minli Xu (mxu5@uncc.edu) & Dr Zhengchang Su (zcsu@uncc.edu)
Bioinformatics Research Center, University of North Carolina - Charlotte
Last update: 16-July-2009 11:49 AM