--------------------------
  Data
--------------------------

The files .rda are R archives and must be loaded with R command 'load'.

The archive 'enriched.string.10.net.rda' contains a named symmetric matrix W containing the 
		pairwise similarities of genes.
          W[i,j] is the weight of cennection between gene "i" and gene "j".
          
The archive 'MeSH.associations.2.200.positives.04.17.rda' contains the matrix 'Y' whose 
		columns are diseases, and rows are genes.
			Y[i,j] = 1 if gene 'i' is associated with disease 'j', 0 otherwise.
		Each colum contains from 2 to 200 entry equal to 1.
			
The archive 'ancestors.2.200.positives04.17.rda' contains the matrix 'ancestors' whose 
		rows and columns	are diseases in the matrix Y
			ancestors[i,j] = 1 if disease j is ancestor of disease i in the MeSH hierarchy
       

The  .tsv are tab separated files containing the same information of the corresponding
.rda archives.         

The file enriched.string.10.net.tsv is in triplet sparse format: each row is in the form

Gene A  Gene B   Connection weight


In order to include all the available genes, STRING gene IDs, i.e. starting with ENS*,
 which we did not have a conversion to gene name in the STRING mapping files, 
 have been left unchanged but not  discarded.
-------------------------------------------------------------------------------------
Predictions
--------------------------


The file 'gene2disco.raw.scores.terms.without.positives.04.17.rda' is an R archive 
containing the matrix 'scores', the row predictions in [0,1] for the 348 diseases 
 without known positive genes. Rows correspond to genes and columns to diseases. 
 scores[i,j] is set by convention to 1000 when gene i was already associated with   
 disease j. 


The file 'gene2disco.rawscores.terms.1.200.positives.04.17.rda' is an R archive 
 containing the matrix 'scores', the row predictions for the 494 diseases with 
 1 up to 200  associated genes. Rows correspond to genes and columns to diseases.


'gene2disease.novel.assoc.terms.without.positives.tsv' is a tab separated file
 contaning for each row one of the considered MeSH headings without positive genes,
 the top three genes selected by our algorithm and the corresponding raw score:

				MeSH heading Gene1/score Gene2/score Gene3/score


'gene2disease.novel.assoc.terms.1.200.positives.tsv' is a tab separated file
 contaning for each row one of the considered MeSH headings with 1 up to 200 
  positive genes, the top three non positive genes (i.e. currently not associated
   with the disease)  selected by our algorithm and the corresponding raw score:

				MeSH heading Gene1/score Gene2/score Gene3/score