-------------------------- Data -------------------------- The files .rda are R archives and must be loaded with R command 'load'. The archive 'enriched.string.10.net.rda' contains a named symmetric matrix W containing the pairwise similarities of genes. W[i,j] is the weight of cennection between gene "i" and gene "j". The archive 'MeSH.associations.2.200.positives.04.17.rda' contains the matrix 'Y' whose columns are diseases, and rows are genes. Y[i,j] = 1 if gene 'i' is associated with disease 'j', 0 otherwise. Each colum contains from 2 to 200 entry equal to 1. The archive 'ancestors.2.200.positives04.17.rda' contains the matrix 'ancestors' whose rows and columns are diseases in the matrix Y ancestors[i,j] = 1 if disease j is ancestor of disease i in the MeSH hierarchy The .tsv are tab separated files containing the same information of the corresponding .rda archives. The file enriched.string.10.net.tsv is in triplet sparse format: each row is in the form Gene A Gene B Connection weight In order to include all the available genes, STRING gene IDs, i.e. starting with ENS*, which we did not have a conversion to gene name in the STRING mapping files, have been left unchanged but not discarded. ------------------------------------------------------------------------------------- Predictions -------------------------- The file 'gene2disco.raw.scores.terms.without.positives.04.17.rda' is an R archive containing the matrix 'scores', the row predictions in [0,1] for the 348 diseases without known positive genes. Rows correspond to genes and columns to diseases. scores[i,j] is set by convention to 1000 when gene i was already associated with disease j. The file 'gene2disco.rawscores.terms.1.200.positives.04.17.rda' is an R archive containing the matrix 'scores', the row predictions for the 494 diseases with 1 up to 200 associated genes. Rows correspond to genes and columns to diseases. 'gene2disease.novel.assoc.terms.without.positives.tsv' is a tab separated file contaning for each row one of the considered MeSH headings without positive genes, the top three genes selected by our algorithm and the corresponding raw score: MeSH heading Gene1/score Gene2/score Gene3/score 'gene2disease.novel.assoc.terms.1.200.positives.tsv' is a tab separated file contaning for each row one of the considered MeSH headings with 1 up to 200 positive genes, the top three non positive genes (i.e. currently not associated with the disease) selected by our algorithm and the corresponding raw score: MeSH heading Gene1/score Gene2/score Gene3/score