Annotating your genes

Created: April 04, 2019
Last updated: November 22, 2019

by Juan A. Botía and Sonia García-Ruiz

This is the third and last part of the tutorial series "Getting started with CoExp WebPage application".

Previous: Surfing the Catalog

In the previous tutorial, we explained how to make use of the "Network Catalog" tab. In this one, we will explain how to use the "Gene Set Annotation" tab.

Let us suppose that we want to study the same genes as in the previous tutorial (these are our list of PD genes obtained from the Genomics England panel App), but now within the Putamen and the Substantia Nigra tissues. In order to do that, we will firstly make click over the Gene Set Annotation tab. Once we are there, we will then move onto the menu placed on the left-hand side of the webpage. As we can see, that menu contains a tree-view of all categories and networks available in CoExp. Let's now display the gtexv6 category and select both the "Putamen" and "Substantia Nigra" tissues:

Selecting substantia nigra and putamen networks from GTEx V6

Fig9: Selecting substantia nigra and putamen networks from GTEx V6.

The next thing we will need to do is to paste our list of genes (separated by spaces or commas) in the text-area field placed at the bottom part of the menu. Finally, we have to click the "Accept" button, and that's all! After a few seconds, the results will be displayed in the middle of the webpage in the shape of a table.

If everything has gone as expected, we will be able now to see a similar table to the one below. Each row corresponds to each one of our input genes (in case it has been found in any of the selected networks) found within each of the networks selected. In this particular example, we have two different rows for each gene. Each row corresponds to each one of the tissues selected: the Substantia Nigra and Putamen networks.

Annotation results for PD genes

Fig10: Annotation results for PD genes.

In terms of columns' meaning, we have:

gene, category and network These columns indicate one gene from our input list that has been found within one of the categories and network selected.
ensgene This column contains the Ensembl name of the gene.
fisher, FDR and Bonferroni These three columns refer to a similar concept but expressed in three different ways. That similar concept represents how significant is the overlap between your input genes and the genes that lie within each module belonging to your selected network.
In particular, they mean:
  • Fisher: p-value obtained from a Fisher´s Exact test executed over the overlap mentioned above.
  • FDR: Fisher's Exact test p-values, but adjusted by a "False Discovery Rate" function for multiple testing.
  • Bonferroni: Bonferroni correction factor applied to the p-values. This test is based on the number of modules per each network that contains any gene from our input set.
mm This column refers to the module membership of the gene. Any value from above 0.5, is representing a strong value to be aware of.

One of the first things we may now be wondering is how the genes cluster together across different modules. Exploring this is idea might be very interesting for many reasons. For example, all genes that cluster together in the same module will receive the same annotation, as the annotation comes from the module that they belong to as a whole. Therefore, the higher the number of genes from your input list belonging to a particular module, the stronger the link will be between the phenomena you are studying and the module's annotation. For example, if we now click at the top of the fisher column, we will visualize all Fisher p-values ordered from lower to a higher value. This will allow having a quick view of the most evident groups of genes per each module, as it follows:

Gene clustering at the gene level

Fig11: Gene clustering at the gene level.

In the image above, we can see that the genes C19orf12, GCH1, SLC6A3, SNCA, SYNJ1, ATP13A2, TH , are clustering together within the "darkorange2" module from "Substantia Nigra" tissue. We may also notice that the clustering is far away from being due to random chance. This is because of two main reasons. The first one is that Fisher's p-value is 10e-4, which is highly significant. Secondly, the Fisher's p-value obtained also survives from both FDR and Bonferroni corrections (their p-values are lower than 0.05). Another interesting point is that all genes seem to be playing a strong role within the darkorange2 module. As we mention before, values greater than 0.5 in the mm column represent strong values; in our example, we have obtained values very near to 1.

Let us explore these results in greater depth. For example, if we click now over any of the darkorange2 module links, a new popup window should appear. That new window will contain the "Catalogue Network" view, but showing all the specific details to the darkorange2 module. Thus, if we look onto the table data, we may notice that this module is enriched for REACTOME terms, such as "Axon guidance", and BP terms like "regulation of neurotransmitter levels". On the other hand, the cell type enrichment for the darkorange2 module is totally neuronal, including dopaminergic markers. In this sense, we may conclude that this module is clearly Parkinson related. In addition, this analysis suggests that all these genes are also implicated in different "cell type" processes obtained for "Substantia Nigra" tissue. Interestingly, you won´t see the same result replicated in the "Putamen" tissue, which might suggest that this phenomenon only happens in Substantia Nigra tissue, and not in the other PD-related tissues.

Another question we may want to answer is "are there other significant clusters in this analysis? To be able to answer that question, we will firstly click on the "SUMMARISE CLUSTERING" button, which is placed at the top of the table. After clicking it, a different view of the table will appear. This new view will show us the same results as before, but summarised now by the module. Next, if we order this new table by the "Overlap" column (which refers to the number of genes from our input set that fall within the current module), we will be able to get this table below:

Summarized view of the gene annotation table focused on relevant gene overlaps with modules

Fig12: Summarized view of the gene annotation table focused on relevant gene overlaps with modules.

If we observe the results obtained in the table above, we could see that the only statistically-significant module is the darkorange2 one, within the Substantia Nigra tissue. However, notice the overlap of 4 Putamen-tissue genes that are clustering together in the skyblue module. Although the p-value obtained is not significant, this result may be pointing to something interesting. Please, notice that we are working on a panel of genes for monogenic PD, and the likelihood of this panel being incomplete is high.

Let's now move onto the analysis of the whole set of brain tissues available in GTEx v6. As we did before, let's first to annotate the genes and then to open the summarized-clustering view. Finally, let's order the results by clicking over the "overlap" column. If everything has gone as expected, we should see a similar table to the following one:

Almost identical results on genes and modules in nigra, frontal cortex and cortex tissues in GTEx

Fig13: Almost identical results on genes and modules in nigra, frontal cortex and cortex tissues in GTEx.

The image above is showing us that an identical clustering of the genes C19orf12, DCTN1, DNAJC6, PANK2, PRKRA, RAB39B, SNCA, SYNJ1, VPS13A, VPS35 in both turquoise-Anterior Cingular Cortex and in brown-Frontal Cortex. This results in somehow similar to the one obtained in Substantia Nigra tissue. As a final thought, we may see that in this tissue the cell specificity signal is strongest in dopaminergic neurons.

This is the end of the tutorial series "Getting started with CoExp WebPage application".

Go to the first part of the tutorial: Introduction