Time for the yearly minor updates – I’m now at McGill, mostly getting settled into the (Epi)Genomics analysis here at the Innovation Centre. With a brief reminder of this website also comes a reminder that I really ought to do all the niggly but important housekeeping things I’ve been putting off so this afternoon I’ve been running around my directories initialising git repositories. Version control before disaster, always the best time to implement that!
It’s actually gotten harder to separate academic spam out if my inbox now that I have a doctorate. Used to be that anything with Dr. Warren was either spam our at least sent in error (I’ve had a couple students ask to work in “my” lab) – now I have to actually read these more carefully. Still mostly spam though.
It’s been a bit quiet on the science blog here, mostly because I’ve been whacking everything into papers. What feels like a billion iterations later, it seems that the dust is settling into three or four papers. Right now, there’s two – the initial paper in BMC Bioinformatics and a second paper in Genome Medicine. I’m mostly excited that both of these have already grabbed a “Highly accessed” tag, which hopefully means that people are interested in the method! Finally finishing up touches on a third paper, and now I finally get back to playing with research again. YAY!!!
Things that I’ve been reminded to do after seeing things in RECOMB:
MeSH term attachment (general paper) – this data is running! HA! After that, we can run the validation!
Stability check – predictions change with respect to missing annotation, misannotation
Bayesian mode for predicting term attachment – P(term|papers) = P(term)P(papers|term)/P(papers)
Break down the AUC by term (can actually ignore the tree and do it per term…) – for this I should probably rewrite the AUC calculator as an object…
Hausdorff distance == likelihood
PageRank/social network analysis (especially for the author data!)
As for results, seems that the validation set for pharma-chem/disease annotations is ZERO – I should generate the histogram of annotation over time (this is probably interesting in general) – Histogram is in progress. Also potential alternate avenues are doing attachment of all MeSH terms rather than just disease, or looking at the attachment of new pharmacological actions – txt/mesh/mesh_pharma.txt new entries. Will need to compute chem<->all MeSH profiles
Also want to
Database set up, PubMed files transferred – this only leaves Entrez Gene and the MeSH files to be grabbed (in integrator/Archive now!)
the getMeSH script hardcodes the year being grabbed, so had to switch these to the 2011 files.
Had to remember to set up the database files – if the database access script fails silently (as it does if you call a database that’s not in .dbrc) you get weird errors.
Looks like some of the previous builds are almost done…and looks like I might need to rebuild the geneRIF bits?
Still waiting for word on the paper – should probably follow up tonight?
Downloaded the 2011 PubMed files – need to set up an wcdb5 to house it. Currently scp’ing the baseline over to chickenwire, then need to move it into position for the build. Also need versions of Entrez Gene, MeSH, etc…
ALSO, need to update the website and mention exactly which versions of which files are live on the databases.
Grabbed pubmed-chem-term.txt and put it into integrator/mesh-chem. WIll match against the drugbank database, get a list of non-matching pharma. Also, get list of compounds with pharmaco action, and see how much that loses.
Re: circular make. Digenei4 seems to have choked with a “directory doesn’t exist” error for a directory that exists. Maybe a node with a file system problem?
Suddenly realised that RECOMB is nearly upon us – if I want to put in some new figures, now’s the time!
Seems like there’s a circular/non-updating portion in the Makefile…Or hopefully it’s more I’ve been twiddling with the makefiles so it’s been unhappy. Here’s hoping a final build will sufffice to finish things off. And then it’s probably time to build a new version with the new pubmed…Let’s go and download that now, shall we?