Surfing the catalog

Created: April 04, 2019
Last updated: November 22, 2019

by Juan A. Botía and Sonia García-Ruiz

This is the second part of the tutorial series "Getting started with CoExp WebPage application".

Previous: Introduction

Surfing the catalog

As we introduced in the previous tutorial, in this example we would like to annotate a list of genes composed by the monogenic forms of Parkinson's disease, which have been obtained from the Genomics England panel App (only the green genes). Thus, firstly we will move into the tab "Network Catalog". Secondly, we will check the different categories available within the dropdown menu 'Categories', which is placed in the left-hand side section of the webpage:

Available categories

Fig1: All available categories.

If everything has gone as expected, the following list of categories should be now displaying within the 'Categories' dropdown:

Category Description GitHub
ROSMAP This category is composed of four networks (all samples, not AD, probable AD and AD) obtained from Frontal Cortex samples. For any further information, please visit this link. http://github.com/juanbot/CoExpROSMAP
GTEx V6 This category contains 47 co-expression networks in control tissue, including 13 brain areas. http://github.com/juanbot/CoExpGTEx
10UKBEC These are 10 Illumina microarray-based gene expression profiling networks from brain tissue. http://github.com/juanbot/CoExp10UKBEC
GTEx V7 This category contains a beta version of the same 47 co-expression networks from GTExV6. http://github.com/juanbot/CoExpGTExV7
NABEC This category contains a gene co-expression network created from RNA-seq control Cortex samples, which have been quantified at transcript level with Salmon. http://github.com/juanbot/CoExpNABEC

Let's now select gtexv6 category. The next thing we should be able to see, is the 'Network' dropdown field filled with the 47 GTEx tissues, as it's shown in the image below:

Available networks under GTExV6

Fig2: Available networks after selecting GTExV6 category.


"BY ONTOLOGY" tab

Let's select the 'Caudate' tissue. The dropdown field 'Module Selection' should be now enabled and filled with two different options: "By Ontology" and "By Cell Type". Both options refer to two different ways of visualizing the results. The "By Ontology" option, will generate a table showing the annotation function of the 'Caudate' genes. On the other hand, the "By Cell Type" option, will generate a different table containing the cell type annotation of the genes within the 'Caudate' network.

In terms of the "By Ontology" view, the table generated will contain the following columns:

COLUMN DESCRIPTION
module It contains the module, within the selected network, that the annotation term obtained refers to.
p.value In order to annotate each gene within each module, CoExp R suit-of-packages has made use of the gProfiler tool. This p-value column, therefore, refers to how significant is the overlap of genes associated with both the term and genes found in the current module.
module.size This column refers to the total number of genes found in the current module.
ontology This column contains the annotation ontology terms. They can refer to Biological Process (BP), Cellular Component (CC) and Molecular Function (MF) respectively. All of them are sub-ontologies from the Gene Ontology project. In addition, this column can also contain the terms "rea" and "keg", which refer to the REACTOME and KEGG ontologies respectively.

Next, after clicking the "Accept" button, a table - similar to the following one - will appear on the right-hand side of the webpage.

'By Ontology' visualisation of the Caudate network

Fig3: 'By Ontology' visualisation of the 'Caudate' network.

Please, notice the three buttons ("Copy", "Excel" and "Print") placed in the upper part of the table. All of them will allow you to copy the content of the table to the clipboard, download that content into an excel file or send it to a printer.


On the other hand, as you may have noticed, every co-expression network is composed of a set of modules. Each module, which is represented within the column "module", is named by one different color. In this sense, each color represents each cluster where the genes have been grouped together. In our particular example, we can see that the "Caudate" module has 3,914 annotations related to different functionality, which has been mainly obtained from the Gene Ontology, REACTOME and KEGG pathways databases.

Now, let us suppose that we want to obtain all terms related to "RNA processing". For that purpose, we can use the "Search" field, which is placed in the upper right-hand side of the table. After typing "RNA processing" in that field, we have obtained only 19 terms out of the 3914 total terms available.

Modules in the Caudate related to RNA processing in any way

Fig4: Modules in the Caudate related to RNA processing in any way.

If we now focus on the results obtained above, we can see that the darkred module can be a good candidate for studying "RNA processing" within the "Caudate" tissue (its p-value is very significant). In addition, there are other modules hosting different terms. Note that we can order the terms by using the ontology column, so we can group them by the ontology they belong to. Also, on the left-hand side of each row, there is a green button with the "+" sign. If we click on it, we will get some extra information for the current row. The amount of the extra information showed, directly depends on the context of each row. But in any case, all genes appearing there, are the genes from the current module that are implicated on the overlap occurring between that particular module and the ontology term.


"BY CELL TYPE" tab

Now, let us suppose that we are interested in finding only those modules that are specific to the "microglia". As we have selected to search not only by ontology but also by cell type, we can go to the "By Cell Type" tab, and use the "Search" pane to filter the table by the "microglia" term. In this case, the table "By Cell Type" returns the results of testing different marker sets (cell type markers particularly) against all modules. The values that are shown in the table therefore refer to the significance of each result obtained. Thus, the table will contain one column per each module studied and one row per each marker compared.

Modules in the Caudate specific to microglial cell type

Fig5: Modules in the Caudate specific to microglial cell type.

And we can see cyan, pink and to a lesser extent grey60, show enrichment for microglial markers. The marker sets can be seen within the table rows. Having modules as, for example, cyan with significant signals in more than 1 datasets, must be understood one module having alternative evidence of similar phenomena but in different datasets. Apart from the significance reflected by the p-value (i.e. which comes from testing the significance of the overlap between the genes in the module and the genes in the marker set), the more signals related to microglia are significant, the more confident we should be about that. Note that in this case, we distinguish between deactivated (Type 1) and activated microglia (Type 2) in modules cyan and pink respectively. In that way, cyan and pink are reflecting different events of the same phenomena. Actually, if we now look at "By Ontology" by module names and order by ontology to get the REACTOME terms first, we get:

REACTOME terms in the cyan module

Fig6: REACTOME terms in the cyan module.

On the other hand, for the pink term we have:

REACTOME terms in the pink module

Fig7: REACTOME terms in the pink module.

It seems clear that both modules share generic terms (e.g. Immune System, Innate Immune System). In addition, there are specific terms for each one of them, i.e. "Adaptive immune system" for cyan and "Cytokine signaling in the immune system" for pink.

Other kinds of "cell type" searches may include, for instance, PD or dopaminergic neurons. Let's try it with the dopa term:

Modules in Caudate related with dopaminergic neurons

Fig8: Modules in Caudate related with dopaminergic neurons.

In this case, we have obtained only three different results matching with the "dopa" term. It is important to highlight that, albeit the enrichment p-values may not be highly remarkable, they can be modest p-value signals pointing us to something important.



Next: Annotating your genes