Had to tweak the data dump scripts, but it looks like the homologene Human to mouse to disease is done. Starting to get leery of evidence from low numbers of Pubmed articles. These tend to result in nice low p-values, but I’m guessing that they’re biased by low sample size. Then again, aren’t p-values supposed [...]
Archive for the ‘SQL’ Category
Homologene Results are In!
Posted in R, SQL on September 14, 2007 | Leave a Comment »
Reloaded…and then???
Posted in SQL on August 28, 2007 | Leave a Comment »
After the reload, jumped right in and tried the big query. And no error…but no result?! Starting to get frustrated, says my brain. But turns out that the mesh table was incorrectly loaded – darn difference between a comma and semi-colon! Trying to re-run now with corrected results. Also looking at using the homologene results. [...]
Web Version
Posted in rails, ruby, SQL on August 24, 2007 | Leave a Comment »
Starting to look at what will be needed to make a web accessible version of the database live. Ruby on Rails looks promising – OO scripting language, automagic linking to databases and building webpages. What’s not to like? Downsides – looks like it needs new-style MySQL password hashing – it looks like the systems here [...]
Reloaded!
Posted in SQL on August 24, 2007 | Leave a Comment »
Concatenated all the loading commands into a script. The reloading went off without a hitch last night, so here’s crossing my fingers hoping that first order related articles will work out. Things to consider doing while I’m in the reloading frame of mind: writing scripts to automatically download newer versions of all the relevant databases [...]
SQL Woes
Posted in SQL on August 23, 2007 | Leave a Comment »
Currently suffering from SQL limitation setbacks – hitting situation where temporary tables are getting big (then again, we’re in the millions or billions or records…maybe there’s temp file limitations). On the upside, I’m happy to discover that MySQL text client has tab completion… Current status – I’ve loaded homologene, and I’ve crafted queries for related [...]
Biting the Bullet
Posted in progress, SQL on June 11, 2007 | Leave a Comment »
A bit anticlimatic — I had previously waited forever for Entrez Gene and all the GeneRIFs to convert from XML to to text files. So rather than doing that myself, I wrote the code to load from text files on the Entrez Gene FTP server. Have other people do most of my processing and all [...]
Optimise
Posted in progress, SQL, todo on April 28, 2007 | Leave a Comment »
Sounds like a transformers yell…”Transformers, Optimise!” Anyways, thing to fix: all the CREATE TABLE AS SELECT…. create tables without keys. No key is definitely bad. Trying this for the one that really matters – the gene_term_related_citations query. EXPLAIN seems happier, but proof will be in the pudding (or the running, in this case). Dang…optimised query [...]
A cunning plan
Posted in progress, SQL on April 26, 2007 | Leave a Comment »
Time to get cracking on the optimisation. The dual plans are to Expand the TF database with taxonids (and eventually load all of Entrez Gene) Optimise queries using table indexes Looks optimistic – I like orders of magnitude improvements. Just hacked the Entrez Gene fetcher to extract the taxonid from the XML. I should look [...]
Related Articles – Pulling the Plug
Posted in SQL on April 24, 2007 | Leave a Comment »
Got back from CGDN, and the “genes to related articles” table creation querying was still stuck at “copying to tmp table”. Since that makes it a solid handful of days, I’m kind of scared as to exactly how big this table will be. So, in the spirit of not emptying disk partitions, I’ve pulled the [...]