SINTNET

SINTNET [1] is an unbalance-aware network integration method to construct a more reliable and informative composite network for the automated function prediction of proteins. SINTNET assigns a weight (real number in [0, 1] interval) to the input network according to its 'informativeness' with respect to a given functional class, taking into account the unbalance between annotated and unannotated proteins.

SINTNET code

The source code can be downloaded here. The archive contains two files:

SINTNET.R: R code implementing the SINTNET integration method
SINTNET_optimization.c: C source code implementing the core optimization procedure of SINTNET

The C source file must be compiled to produce the shared object SINTNET_optimization.so with the R command

R CMD SHLIB SINTNET_optimization.c

Usage of SINTNET: example with yeast networks

We consider a simple example to integrate two networks for the prediction of the FunCat [2] category "01" (Metabolism) in the yeast model organism.

First we must load the SINTNET code in memory:

> source("SINTNET.R")

We use yeast data from the CRAN package 'bionetdata' [3]. We start with binary protein-protein interaction data (Yeast.STRING.data) from the STRING data base [4]; in the bionetdata package these data are represented through a binary named matrix. Names correspond to systematic names of yeast genes. Yeast.STRING.FunCat represents FunCat annotations through a binary matrix for the genes included in Yeast.STRING.data . Annotations refer to the funcat-2.1 scheme, available from the MIPS web site:

> library(bionetdata); 
> data(Yeast.STRING.FunCat);
> data(Yeast.STRING.data); 
> labels <- Yeast.STRING.FunCat;
> labels[labels == 0] <- -1; 
> labels <- labels[, -which(colnames(labels)=="00")]; # excluding the dummy "00" root 
> W <- unlist(Yeast.STRING.data);
> proteins <- rownames(W); 
> labeling <- labels[proteins, 1]; # first class corresponding to "01" (Metabolism)
> names(labeling) <- proteins;
> w1 <- SINTNET(W,labeling)

Then we consider binary protein-protein interactions (Yeast.Biogrid.data) downloaded from the BioGRID database [5], that collects PPI data from both high-throughput studies and conventional focused studies :

> data(Yeast.Biogrid.data)
> data(Yeast.Biogrid.FunCat)
> labels <- Yeast.Biogrid.FunCat;
> labels[labels == 0] <- -1; 
> labels <- labels[, -which(colnames(labels)=="00")]; # excluding the dummy "00" root
> W2 <- unlist(Yeast.Biogrid.data);
> proteins2 <- rownames(W2); 
> labeling <- labels[proteins2, 1]; # first class 
> names(labeling) <- proteins2; 
> w2 <- SINTNET(W2,labeling)

Once obtained the SINTNET weights for both the networks, each of the considered networks is extended by including the union of the proteins, and then a weighted integration is performed by using the weights computed by SINTNET:

> totprot <- union(proteins, proteins2)
> n <- length(totprot)
> ext.W <- matrix(0, nrow=n, ncol=n);
> rownames(ext.W) <- colnames(ext.W) <- totprot;
> n <- length(totprot)
> ext.W2 <- matrix(0, nrow=n, ncol=n);
> rownames(ext.W2) <- colnames(ext.W2) <- totprot;
> ext.W[proteins,proteins]<-W;
> ext.W2[proteins2,proteins2]<-W2;
> integrated.W<- w1*ext.W + w2*ext.W2

It is easy to extend this procedure by adding other networks, computing the weights associated to each network through SINTNET. Finally the integrated network can be given as input to a graph-based algorithm (e.g. COSNet [6]) to predict whether the unlabeled nodes/proteins of the network belong to the functional class under study.

References

[1] Frasca, M., Bertoni, A., and Valentini, G. SINTNET: unbalance-aware network integration for the automated function prediction of proteins (submitted)
[2] Ruepp, A. et al. The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research, 32 , 5539-5545, 2004.
[3] Re, M. and Valentini, G. Bionetdata -- CRAN R package: http://cran.r-project.org/web/packages/bionetdata
[4] Von Mering, C. et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417 , 399-403, 2002.
[5] Stark, C.et al. Biogrid: a general repository for interaction datasets. Nucleic Acids Research, 34 , D535-D539, 2006
[6] Frasca, M., Bertoni, A., Re, M. and Valentini, G. A neural network algorithm for semi-supervised node label learning from unbalanced data. Neural Networks, 43, 84-98, 2013.