Seems pretty similar to the cmp-digenei result. Should note that CTD validation and training validation are unchanged, since those only depend on the predictions.
Mouse direct connections are done. Write a simple cmd-line tool to grep results? What other organisms would be useful use cases – yeast perhaps?
I really need a web interface – somewhere people can look up via gene or disease, or upload a list and get a bunch back.
Still need to merge the gene/disease backgrounds into p-values. Likely this will be two more files – another prefix perhaps to indicate the background – maybe something like:
hum-gene2pubmed-gene-mesh-p.txt -> gene2pubmedBG-gene2pubmed-gene-mesh-p.txt
disease-comesh-p.txt ->diseaseBG-disease-comesh-p.txt
In these cases, the filtering (hum or disease) needs to be done first, as computing p-values for the superset makes no sense (done)
OK, the names are starting to get a bit messy, with hum-gene2pubmed-gene-mesh.txt vs hum-gene-gene2pubmed-mesh-refs.txt
Need better documentation for filter-file.py – the –field parameter was incorrectly used resulting in errors in the last makefile (blech!)