Help + FAQ

1.  Quick start

1.1  What can PathwayLinker do for me?

The primary goal of PathwayLinker is to support experiments:

• PathwayLinker links the selected protein(s) to signaling pathways via protein-protein interactions and/or genetic interactions,
• Requires no computational background,
• Can help in experiment design and the evaluation of unexpected phenotypes.

1.2  Do I need any computational skills for using PathwayLinker? No.

No. Only enter the search terms, select the proteins and data sources you wish to use, and then view the report.

1.3  In a nutshell: How to search with PathwayLinker

• On the first page
• Select an organism and enter one or more search terms in the query box
• or: Select one of the examples by clicking on one of the numbers below the query box.
• Press the 'Search' button.
• On the following pages use the navigation links (step forward or back) and
• Select the requested protein(s): one or more proteins for each search term.
• If necessary, replace one or more of the search terms by clicking on 'Replace'.
• Allow or disallow unreviewed proteins by clicking on the gray link below in the line below the organism's name.
• Select the sources of interaction data to be used by PathwayLinker for linking the selected protein(s) to signaling pathways.
• Select the sources of signaling pathways (signaling pathway databases).
• On the report page
• View the report, highlight proteins & interactions in the interactive network viewer, analyze proteins separately or in groups, or view statistics.
• Refine your search by navigating to previous pages with the link 'Change search parameters'.
• Download & save data from the report page.

1.4  Output formats

Currently the following output formats are available:

• HTML: interactive network visualization (with Cytoscape Web and jQueryUI) combined with analyses:
• Listing and interactively visualizing the network of the first and second neighbor interactors of the query protein(s)
• Labeling signaling pathway member proteins among these proteins
• Providing direct search links for these proteins to primary databases and integrated search engines
• TXT: machine readable plain text generated with the API of PathwayLinker,
• PDF.

1.5  How can I view a previously computed report

• On the report page scroll down to Download & Save / Short stable address (bottom left of the report page).
• Click on the   Open »   button that is to the right from Short stable address and save the short URL of the report or email this URL to your preferred address.

1.6  How can I cite PathwayLinker? Can I cite a publication?

If you find PathwayLinker useful for your work (interactively or the API), then please cite our work. Thank you. This is the citation:
Linking proteins to signaling pathways for experiment design and evaluation
Farkas I J, Szántó-Várnagy Á, Korcsmáros T
PLoS ONE (2012, in press) doi: 10.1371/journal.pone.0036202

1.7  Contact us: Bug reports, feature requests, comments, etc.

Please send an email to Illés Farkas. Thank you.

↑ Top

2.  Search

2.1  Which species can I use with PathwayLinker?

Currently, you can use PathwayLinker for the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster, and humans.

2.2  Can I search for more than one protein?

Yes, you can enter multiple protein names and/or IDs. Separate search terms with spaces.

2.3  Which types of gene/protein identifiers can be used when searching for proteins?

PathwayLinker searches for proteins with the synonym search service of UniProt. Searches are case-insensitive and cover protein description fields too. This is the list of protein/gene synonym types recognized for the three selected organisms:

UniProtKB AC or ID, UniParc, UniRef (50, 90, 100), EMBL/GenBank/DDBJ, EMBL/GenBank/DDBJ CDS, PIR, UniGene, Entrez Gene (GeneID), GI number, IPI, RefSeq, PDB, DisProt, HSSP, DIP, MINT, Ensembl, Ensembl Protein, Ensembl Transcript, GeneID, GenomeReviews, KEGG, TIGR, UCSC, VectorBase, FlyBase, GeneCards, PharmGKB, WormBase, WormPep, eggNOG, HOGENOM, HOVERGEN, OMA, OrthoDB, ProtClustDB, BioCyc, Reactome, DrugBank, NextBio.

2.4  Can I search for drug targets with the DrugBank IDs, Brand names and Generic names of the drugs/compounds?

Yes. If the selected organism is human, then PathwayLinker looks up each search term in the list of DrugBank IDs, Brand names and Generic names downloaded from DrugBank in Jan/2012 (from DrugBank's Full Database / "All Drugs, including target and enzyme information" in DrugCard format). If a search term is found on this list, then the set of search terms passed to the UniProt API is supplemented with the UniProt primary accession(s) of the target(s) of the given drug/compound.

2.5  Why does my gene/protein not show up in the search? (search details)

By default PathwayLinker searches for UniProtKB/Swiss-Prot (reviewed) proteins and excludes "putative" and "uncharacterized" proteins. This search can be extended by including UniProtKB/TrEMBL (unreviewed), "putative" and "uncharacterized" proteins.

Example:

At the search page select Homo sapiens, type "RELA" (without quotes) and click on "Search". The results you will see on the following screen exclude TrEMBL, "putative" and "uncharacterized" proteins, and are downloaded from UniProt with this query. If you include TrEMBL, putative and uncharacterized proteins (by clicking on "allow TrEMBL" on the "Select proteins" page), then PathwayLinker will download its search results through this UniProt query.

2.6  What is the algorithm used for linking the selected proteins to signaling pathways?

With the selected interaction and signaling pathway membership data sources PathwayLinker lists (i) the first neighbor interactors of the selected proteins and (ii) the signaling pathway memberships of the selected proteins and their first neighbors.

2.7  Which interaction types does PathwayLinker use for finding the first neighbor interactors of the selected protein(s) and the interactions between the first neighbor interactors?

PathwayLinker uses physical (both small-scale and high-throughput) and genetic interactions. One of the small-scale interaction sources, SignaLink, contains directed interactions, while all others contain undirected interactions. These are the interaction databases used by PathwayLinker:

• For high-throughput physical interactions:
• For "small-scale" physical interactions:
• For genetic interactions:
• Additional online tools:

2.8  Which signaling pathway databases does PathwayLinker use?

PathwayLinker uses signaling pathway membership data from three sources. These are the signaling pathway databases used by PathwayLinker:

2.9  Can I request an additional interaction type or pathway database?

Yes, please send an email to Illés Farkas.

2.10  Can I change the search parameters?

Yes, at any time you can use the navigation links on the top right to move back and forward. These navigation links are often repeated at the bottom of the page too.

2.11  How can I learn more about the autocomplete function (on the search page)?

As you type in the search box ("Proteins"), PathwayLinker recommends completions based on its database of keywords extracted from UniProt.

» Read more and download the autocomplete database and the Perl scripts generating it (with documentation)

↑ Top

3.  Results

3.1  How can I view the network of the selected protein(s) and its (their) first neighbor interactors?

The report page of PathwayLinker contains an embedded visualization of the network of the selected protein(s) and its (their) first neighbor interactors. You can highlight (separately or together) the interactions of each interaction type and the proteins of each signaling pathway. The visualization tool uses Cytoscape Web.

3.2  Does PathwayLinker provide external links for the analysis of the functions of (i) individual proteins and (ii) the entire set of proteins (selected + 1st neighbors)?

Yes, the report page of PathwayLinker provides direct links to the result pages of external databases.

3.3  Which signaling pathways are significantly overrepresented among the displayed proteins?

The hypothesis (this will be tested)

In the report (under Statistics / Pathways) we compare two groups of proteins: (a) the proteins selected in your query and their first neighbor interactors (i.e., the proteins displayed in the network on the right side of the report) and (b) the set of all proteins of the selected organism. For each signaling pathway, s, taken from the selected signaling pathway sources we test the following hypothesis: Members of s are overrepresented in set (a) compared to set (b).

The null hypothesis (reference)

As a null hypothesis we assume that the sets (a) and s have been selected independently. In other words, for the statistical control of set (a) we consider all allowed sets of proteins with equal weight, without a bias toward the signaling pathway s.

Testing the hypothesis

Denote the number of proteins in group (a) by Na and the number of proteins in group (b) by Nb (note: Nb >> Na). Test the above hypothesis for each signaling pathway, s, containing at least one protein from group (a). Denote the number of proteins contained by both s and (a) by na,s. Define nb,s similarly. Next, select randomly Na proteins from group (b) and count how many of the nb,s pathway member proteins are within this selection. If 0 ≤ na,s ≤ Na, then the probability of having exactly na,s pathway member proteins in the random selection (which has size Na) is, according to the hypergeometric distribution,

.

\binom{N_b}{N_a}^{-1}\,\binom{n_{b,s}}{n_{a,s}}\,\binom{N_b-n_{b,s}}{N_a-n_{a,s}}

p(n_{a,s})=\binom{N_b}{N_a}^{-1}\,\binom{n_{b,s}}{n_{a,s}}\,\binom{N_b-n_{b,s}}{N_a-n_{a,s}}

The probability of observing the actual na,s value or a larger one is the p-value of observing the actual na,s value:

.

\sum_{x=n_{a,s}}^{\mathrm{min}(N_a,\,n_{b,s})}\,\binom{N_b}{N_a}^{-1}\,\binom{n_{b,s}}{x}\,\binom{N_b-n_{b,s}}{N_a-x}

:

P(n_{a,s})=\sum_{x=n_{a,s}}^{\mathrm{min}(N_a,\,n_{b,s})}\,\binom{N_b}{N_a}^{-1}\,\binom{n_{b,s}}{x}\,\binom{N_b-n_{b,s}}{N_a-x}

:

P(n_{a,s}=X)=\sum_{x=X}^{\mathrm{min}(N_a,\,n_{b,s})}\,\binom{N_b}{N_a}^{-1}\,\binom{n_{b,s}}{x}\,\binom{N_b-n_{b,s}}{N_a-x}

:

P(n_{a,s}=X)=\sum_{x=X}^{x=N_a}\,\binom{N_b}{N_a}^{-1}\,\binom{n_{b,s}}{x}\,\binom{N_b-n_{b,s}}{N_a-x}

This is the p-value used on the report page of PathwayLinker.

Fast computation of the p-value

First, we save into a temporary array the logarithms of each number between 1 and Nb. Next, we save the logarithm of the factorial of each number between 0 and and Nb. Last, we compute the logarithm of each term in the above expression and then exponentiate and sum these.

Last, we add all terms (after exponentiation from their logarithms) that are larger than the r=1.0e-6 part of the largest term. The number of terms on the r.h.s. is at most y=min(Na, nb,s), thus, according to the central limit theorem, the expected relative error of this approximated P(na,s) value will be below .

The mean, E, and standard deviation, σ, of this distribution are

.

E=\frac{N_a\,n_{b,s}}{N_b} \,,\,\, \sigma =\bigg[\frac{N_a\,\,n_{b,s}\,(N_b-N_a)\,(N_b-n_{b,s})}{N_b^{\,2}\,(N_b-1)}\bigg]^{1/2}

One alternative way to quantify the significance of the actual na,s value is to compute its Z-score:

.

Z=\frac{n_{a,s}-E}{\sigma}

This is the Z score for each pathway, s, in which at least one protein is present from the group of proteins containing the selected proteins and their first neighbor interactors. A positive (or negative) Z score means that, compared to group (b), the proteins of group (a) appear more (or less) frequently in the given signaling pathway.

4.  Downloads

4.1  How can I download the data used by PathwayLinker?

All data are stored as the names of subdirectories and sub-subdirectories, etc.

» Download the data set (detailed documentation below)

Data structure

  root directory
|
\_ by_protein_ac    -- data by UniProtKB accessions (AC) of proteins
|
\_ reviewed-only  -- only UniProtKB/Swiss-Prot (reviewed) proteins are listed in this directory
|
\_ P            -- information on proteins that have UniProtKB accessions (ACs) starting with P
|
\_ 0          -- UniProtKB accessions starting with P0 , e.g., P05111
|
\_ 5
|
\_ 1         ...
|
\_ 1
|
\_ 1  -- data for the protein P05111
|
\_ gene_names_from_uniprot
| |    list of gene names for P05111 taken from the following fields of UniProtKB:
| |    . first choice: GN Name=(...)
| |    . 2nd choice [if 1st is not available]: ID (...)_HUMAN
| |
| \_ subdirectory names are the gene names
|    the characters of the name are converted to numbers and joined with _
|    e.g., the directory name 73_78_72_65 encodes INHA
|
|    Example
|    - the Perl command for converting the directory name to the gene name is:
|      join("",map{chr}split m/_/,"73_78_72_65") --> this gives INHA
|
\_ interactions_by_interactor
| |
| \_ <UniProt AC, e.g., B2RAB8> -- this directory contains information on
| | |                              the interactions between P05111 and B2RAB8
| | |
| | \_ <short name of interaction database, e.g., biogrid>
| |   |  data on the interaction between P05111 and B2RAB8 in this source (biogrid)
| |   |  see below the list of interaction sources and their short names used here
| |   |
| |   \_ pmid  -- contains the PubMed ID(s) of the interaction between
| |     |         P05111 and B2RAB8 in the given database (BioGrid)
| |     |
| |     \_ 10746731
| |        this subdirectory name is the PubMed ID of the publication
| |        providing evidence for the interaction between P05111 and B2RAB8
| |        in the given database (which is BioGrid here in this example)
| |
| \_ ... other interactors of P05111
|
\_ interactions_by_source
| |
| \_ biogrid -- the list of interactions of P05111 according to BioGrid
|   |
|   \_ B2RAB8 -- this is one of the interactors of P05111 in BioGrid
|     |
|     \_ pmid
|       |
|       \_ 10746731 -- the PubMed ID providing evidence on this interation in BioGrid
|
\_ original_name_by_database_source
| |
| \_ hprd
| | |
| | \_ 78_80_95_48_48_50_49_56_50_46_49
| |    this is the numerically encoded identifier of P05111 in HPRD
| |    78 is the character code of N
| |    80 is the character code of P
| |    ...
| |    the identifier of P05111 in HPRD is NP_002182.1
| |
| \_ biogrid
| | |
| | \_ <numerically encoded identifier of P05111 in BioGrid>
| |
| ...
|
\_ primary
|
\_ P05111 -- the primary accession (AC) of P05111 in UniProtKB
note: . in UniProtKB a protein always has one primary accession,
...                        and optionally one or more further accessions
|                        . this directory name maps the AC (UniProtKB accessions) to its primary AC
| ...                    . there are fewer than 40 UniProtKB accessions (ACs) in PathwayLinker
| |                        that do not have a 'primary' subdirectory
| |                        these are obsolete (deleted) accessions

| |
| \_ reviewed-and-unreviewed -- same data structure
| |                             UniProtKB/TrEMBL (unreviewed) proteins are included
|   |
|   ...
|
\_ by_pathway -- signaling pathway-related information
|
\_ kegg -- code of signaling pathway database
|        see below the list of signaling pathway databases
|
\_ 04110 -- the KEGG code of a KEGG pathway
|
\_ pathway_name -- name of this KEGG pathway
| |
| \_ 67_101_108_108_32_99_121_99_108_101
|    name of the KEGG pathway with the code 04110
|    67 is the character code of C
|    101 is the character code of e
|    ...
|    the name of the pathway is Cell cycle
|
\_ pathway_shortName
|
...  \_ ... short name of the pathway in the same numerical format
|
\_ other KEGG pathways similarly
|
\_ reactome -- signaling pathways from Reactome
| |
| ...
|
\_ signalink -- signaling pathways from SignaLink
|
...


4.2  Data sources

Interactions, signaling pathway member lists and other data were downloaded from the databases listed on the download page. We mapped all protein IDs to UniProt primary accession numbers (ACs) with the UniProt API.

4.3  The Perl program used for compiling all data for PathwayLinker

This script reads external data files, sends requests to API of UniProt, and writes all data as directory names into a nested directory structure.

» Read the documentation of this Perl program and download it

 !!The Perl program used for computing the Z scores quantifying the functional similarity of interacting proteins


This analysis quantifies whether -- in the selected organism, with the selected interaction sources (any combination allowed) and with the selected one signaling pathway source -- interacting proteins are, on average, members of the same signaling pathway(s). The analysis is performed by comparing the results to randomized cases and by computing Z scores.

» Read the detailed description and download the program here

5.  Applications

5.1  How can I use PathwayLinker for experiment design?

PathwayLinker was planned to support experiments. If you would like to manipulate a gene or a protein (e.g., knock out or knock down), PathwayLinker can help you to explore how this may affect intracellular signaling and cause unwanted phenotypes.

5.2  How can I use PathwayLinker for evaluating an experiment?

Unexpected phenotypes in an experiment may be caused by changes to proteins that are within or close to signaling pathways. If the phenotype is known to be connected to the malfunctioning of a signaling pathway, then PathwayLinker can help identify proteins that are in the vicinity of this signaling pathway.

5.3  How can I use PathwayLinker for network-based drug target selection/prioritization?

With PathwayLinker you can explore the neighborhood and the signaling effect of a known or proposed drug target protein.

5.4  Can I interactively explore signaling pathways with PathwayLinker?

Yes. Enter known signaling proteins of a pathway into the search box to explore their neighborhood(s).

5.5  Does PathwayLinker use predicted interactions?

No. PathwayLinker uses only experimentally validated interactions.

5.6  Does PathwayLinker predict protein functions?

No. PathwayLinker identifies the connections of the query protein(s) to signaling pathways based on experimentally validated interactions.

5.7  Does PathwayLinker explain phenotypes?

Not at the current stage. However, as many altered phenotypes are closely connected to changes in signaling pathways, PathwayLinker may help identifying genes that may cause the observed phenotypes.

↑ Top

6.  Does PathwayLinker have an API? Yes.

6.2  Examples

  (1) Download report for the C. elegans protein CDC25.1,
use all interaction sources available for C. elegans and all signaling pathway sources:


(2) Download report for the D. melanogaster proteins P33244 (ftz-f1), Q05192 (Hr39), and P18102 (tll),
use interactions from BioGrid and the DroID genetic data set
and signaling pathways from KEGG:


(3) Download report for the human proteins P69905 and P68871 (hemoglobin subunits alpha and beta),
use all interaction types and use signaling pathways from KEGG, Reactome, and SignaLink

  (1) Download report for the C. elegans protein ego-1,
use all interaction sources available for C. elegans and all signaling pathway sources:
http://PathwayLinker.org/api.cgi?o=cel&q=ego-1

(2) Download report for the D. melanogaster proteins P33244 (ftz-f1), Q05192 (Hr39), and P18102 (tll),
use interactions from BioGrid and the DroID genetic data set
and signaling pathways from KEGG:
http://PathwayLinker.org/api.cgi?o=dme&q=P33244+Q05192+P18102&i=biogrid+droidgenetic&p=kegg

(3) Download report for the human proteins P69905 and P68871 (hemoglobin subunits alpha and beta),
use all interaction types and use signaling pathways from KEGG, Reactome, and SignaLink
http://PathwayLinker.org/api.cgi?o=hsa&q=P69905+P68871&p=kegg+reactome+signalink


6.3  Detailed description of the parameters of the API

o   - Organism
One value: cel, dme, hsa
Default: hsa

q   - [query] The list of search terms used for finding proteins
Format: Words separated with +
Default (one search term): axin (for human), ego-1 (for C. elegans), tll (for D. melanogaster)
Note/1: any of the search terms can be a UniProt AC (accession)
Note/2: with this option for each search term a UniProt synonym search is used to find
the most relevant, i.e., highest scoring UniProt accession (AC) with
the following search:

http: //www.uniprot.org/uniprot/
?query=organism:"<ORGANISM_NAME>"
+and+(mnemonic:(<SEARCH_TERM>_*)+or+geneid:(<SEARCH_TERM>)+gene:(<SEARCH_TERM>)+or+
name:(<SEARCH_TERM>)+or+accession:(<SEARCH_TERM>)+or+
((<SEARCH_TERM>)+and+database:drugbank))
+and+reviewed:yes
+and+not+(name:putative+or+name:uncharacterized+or+
gene:putative+or+gene:uncharacterized)
&sort=score

Note:
- replace <SEARCH_TERM> with the search term, e.g., "axin"
- replace <ORGANISM_NAME> with the organism's full name (use a +),
e.g., "Homo+sapiens"

i   - Interaction sources, separated with +
Available values:
for cel:  stringdb
biogrid ccsbwi8 stringexp
ccsbgenetic
for dme:  droidotherphysical stringdb
biogrid droidcuragen droidfinley droidhybrigenics stringexp
droidgenetic
for hsa:  hprd stringdb
biogrid stringexp
Default: all sources available for the given organism are selected

v   - 1: allow TrEMBL (not reviewed), uncharacterized and putative proteins
The above synonym search will be replaced with the following:

http: //www.uniprot.org/uniprot/
?query=organism:"<ORGANISM_NAME>"
+and+(mnemonic:(<SEARCH_TERM>_*)+or+geneid:(<SEARCH_TERM>)+gene:(<SEARCH_TERM>)+or+
name:(<SEARCH_TERM>)+or+accession:(<SEARCH_TERM>)+or+
((<SEARCH_TERM>)+and+database:drugbank))
&sort=score

0: no, do not allow these additional protein types
Default: 0

p   - Sources of signaling pathways, separated with +
Available values: kegg reactome signalink
Default: all sources are selected

h   - Help
1: Turn on verbose output (default)
0: Return only the data