MARCO FRASCA
UniPRED:
Assistant Professor 
Computer Science Department
UniversitÓ degli studi di Milano
Room T304, III floor
Via Comelico 39 
Tel: (+39) 02 503 16295/16321
E-mail:  frasca [at] di [dot] unimi [dot] it
(replace - at - with @ )

Teaching

Publications

Software

UNIPred

UNIPred [1] is an unbalance-aware network integration method to construct a more reliable and informative  composite network for the automated function prediction of proteins. UNIPred assigns a weight (real number in [0, 1] interval) to the input network according to its 'informativeness' with respect to a given functional class, taking into account the unbalance between annotated and unannotated proteins.


UNIPred code

The source code can be downloaded here. The archive contains two files:
    • UNIpred.R:                            R code implementing the UNIPred integration method
    • UNIpred_optimization.c:       C source code implementing the core optimization procedure of UNIPred
The C source file must be compiled to produce the shared object UNIPred_optimization.so with the R command
        

Usage of UNIPred: example with yeast networks

We consider a simple example to integrate two networks for the prediction of the FunCat [2] category "01" (Metabolism) in the yeast model organism.

First we must load the UNIPred code in memory:
> source("UNIPred.R")
We use yeast data from the CRAN package 'bionetdata' [3]. We start with binary protein-protein interaction data (Yeast.STRING.data) from the STRING data base [4]; in the bionetdata package these data are represented through a binary named matrix. Names correspond to systematic names of yeast genes. Yeast.STRING.FunCat represents FunCat annotations through a binary matrix  for the genes included in Yeast.STRING.data . Annotations refer to the funcat-2.1 scheme, available from the MIPS web site:

Then we consider binary protein-protein interactions (Yeast.Biogrid.data)  downloaded from the BioGRID database [5], that collects PPI data from both high-throughput studies and conventional focused studies :

Once obtained the UNIPred weights for both the networks, each of the considered networks is extended by including the union of the proteins, and then a weighted integration is performed by using the weights computed by UNIPred:

It is easy to extend this procedure by adding other networks, computing the weights associated to each network through UNIPred. Finally the integrated network can be given as input to a graph-based algorithm (e.g. COSNet [6]) to predict whether the unlabeled nodes/proteins of the network belong to the functional class under study.

References

[1] M. Frasca, A. Bertoni, G. Valentini. UNIPred: unbalance-aware Network Integration and Prediction  of protein functions. Journal of Computational Biology, 22(12):1057–1074, 2015. ISSN 1066-5277. doi: 10.1089/cmb.2014.0110.
[2] Ruepp, A. et al. The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research, 32 , 5539-5545, 2004.
[3] Re, M. and Valentini, G. Bionetdata -- CRAN R package: http://cran.r-project.org/web/packages/bionetdata
[4] Von Mering, C. et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417 , 399-403, 2002.
[5] Stark, C.et al. Biogrid: a general repository for interaction datasets. Nucleic Acids Research, 34 , D535-D539, 2006
[6] Frasca, M., Bertoni, A., Re, M. and Valentini, G. A neural network algorithm for semi-supervised node label learning from unbalanced data. Neural Networks, 43, 84-98, 2013.