MARCO FRASCA
UNIPred
Assistant Professor 
Computer Science Department
UniversitÓ degli studi di Milano
Room T304, III floor
Via Comelico 39 
Tel: (+39) 02 503 16295/16321
E-mail:  frasca [at] di [dot] unimi [dot] it
(replace - at - with @ )

Teaching

Publications

Software

UNIPred

UNIPred [1] is an unbalance-aware network integration method to construct a more reliable and informative  composite network for the automated function prediction of proteins. UNIPred assigns a weight (real number in [0, 1] interval) to the input network according to its 'informativeness' with respect to a given functional class, taking into account the unbalance between annotated and unannotated proteins.
        
       
UNIPred is also available as web tool.


UNIPred code

The source code can be downloaded here. The archive contains two files:
    • UNIpred.R:                            R code implementing the UNIPred integration method
    • UNIpred_optimization.c:       C source code implementing the core optimization procedure of UNIPred
The C source file must be compiled to produce the shared object UNIPred_optimization.so with the R command
        

Usage of UNIPred: example with yeast networks

We consider a simple example to integrate two networks for the prediction of the FunCat [2] category "01" (Metabolism) in the yeast model organism.

First we must load the UNIPred code in memory:
> source("UNIPred.R")
We use yeast data from the CRAN package 'bionetdata' [3]. We start with binary protein-protein interaction data (Yeast.STRING.data) from the STRING data base [4]; in the bionetdata package these data are represented through a binary named matrix. Names correspond to systematic names of yeast genes. Yeast.STRING.FunCat represents FunCat annotations through a binary matrix  for the genes included in Yeast.STRING.data . Annotations refer to the funcat-2.1 scheme, available from the MIPS web site:

Then we consider binary protein-protein interactions (Yeast.Biogrid.data)  downloaded from the BioGRID database [5], that collects PPI data from both high-throughput studies and conventional focused studies :

Once obtained the UNIPred weights for both the networks, each of the considered networks is extended by including the union of the proteins, and then a weighted integration is performed by using the weights computed by UNIPred:

It is easy to extend this procedure by adding other networks, computing the weights associated to each network through UNIPred. Finally the integrated network can be given as input to a graph-based algorithm (e.g. COSNet [6]) to predict whether the unlabeled nodes/proteins of the network belong to the functional class under study.

References

[1] M. Frasca, A. Bertoni, G. Valentini. UNIPred: unbalance-aware Network Integration and Prediction  of protein functions. Journal of Computational Biology, 22(12):1057–1074, 2015. ISSN 1066-5277. doi: 10.1089/cmb.2014.0110.
[2] Ruepp, A. et al. The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research, 32 , 5539-5545, 2004.
[3] Re, M. and Valentini, G. Bionetdata -- CRAN R package: http://cran.r-project.org/web/packages/bionetdata
[4] Von Mering, C. et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417 , 399-403, 2002.
[5] Stark, C.et al. Biogrid: a general repository for interaction datasets. Nucleic Acids Research, 34 , D535-D539, 2006
[6] Frasca, M., Bertoni, A., Re, M. and Valentini, G. A neural network algorithm for semi-supervised node label learning from unbalanced data. Neural Networks, 43, 84-98, 2013.