Had to tweak the data dump scripts, but it looks like the homologene Human to mouse to disease is done.
Starting to get leery of evidence from low numbers of Pubmed articles. These tend to result in nice low p-values, but I’m guessing that they’re biased by low sample size. Then again, aren’t p-values supposed to [...]
Archive for the ‘SQL’ Category
Homologene Results are In!
Posted in R, SQL on September 14, 2007 | Leave a Comment »
Reloaded…and then???
Posted in SQL on August 28, 2007 | Leave a Comment »
After the reload, jumped right in and tried the big query. And no error…but no result?! Starting to get frustrated, says my brain. But turns out that the mesh table was incorrectly loaded – darn difference between a comma and semi-colon! Trying to re-run now with corrected results.
Also [...]
Web Version
Posted in SQL, rails, ruby on August 24, 2007 | Leave a Comment »
Starting to look at what will be needed to make a web accessible version of the database live. Ruby on Rails looks promising – OO scripting language, automagic linking to databases and building webpages. What’s not to like?
Downsides – looks like it needs new-style MySQL password hashing – it looks like the [...]
Reloaded!
Posted in SQL on August 24, 2007 | Leave a Comment »
Concatenated all the loading commands into a script. The reloading went off without a hitch last night, so here’s crossing my fingers hoping that first order related articles will work out.
Things to consider doing while I’m in the reloading frame of mind: writing scripts to automatically download newer versions of all the [...]
SQL Woes
Posted in SQL on August 23, 2007 | Leave a Comment »
Currently suffering from SQL limitation setbacks – hitting situation where temporary tables are getting big (then again, we’re in the millions or billions or records…maybe there’s temp file limitations).
On the upside, I’m happy to discover that MySQL text client has tab completion…
Current status – I’ve loaded homologene, and I’ve crafted queries for related articles. [...]
Biting the Bullet
Posted in SQL, progress on June 11, 2007 | Leave a Comment »
A bit anticlimatic — I had previously waited forever for Entrez Gene and all the GeneRIFs to convert from XML to to text files. So rather than doing that myself, I wrote the code to load from text files on the Entrez Gene FTP server. Have other people do most of my [...]
Optimise
Posted in SQL, progress, todo on April 28, 2007 | Leave a Comment »
Sounds like a transformers yell…”Transformers, Optimise!”
Anyways, thing to fix: all the CREATE TABLE AS SELECT…. create tables without keys. No key is definitely bad. Trying this for the one that really matters – the gene_term_related_citations query. EXPLAIN seems happier, but proof will be in the pudding (or [...]
A cunning plan
Posted in SQL, progress on April 26, 2007 | Leave a Comment »
Time to get cracking on the optimisation. The dual plans are to
Expand the TF database with taxonids (and eventually load all of Entrez Gene)
Optimise queries using table indexes
Looks optimistic – I like orders of magnitude improvements.
Just hacked the Entrez Gene fetcher to extract the taxonid from the XML. I should look into [...]
Related Articles – Pulling the Plug
Posted in SQL on April 24, 2007 | Leave a Comment »
Got back from CGDN, and the “genes to related articles” table creation querying was still stuck at “copying to tmp table”. Since that makes it a solid handful of days, I’m kind of scared as to exactly how big this table will be. So, in the spirit of not emptying disk partitions, I’ve pulled the [...]