Feeds:
Posts
Comments

Archive for the ‘SQL’ Category

Homologene Results are In!

Had to tweak the data dump scripts, but it looks like the homologene Human to mouse to disease is done.
Starting to get leery of evidence from low numbers of Pubmed articles.  These tend to result in nice low p-values, but I’m guessing that they’re biased by low sample size. Then again, aren’t p-values supposed to [...]

Read Full Post »

Reloaded…and then???

After the reload, jumped right in and tried the big query. And no error…but no result?! Starting to get frustrated, says my brain. But turns out that the mesh table was incorrectly loaded – darn difference between a comma and semi-colon! Trying to re-run now with corrected results.
Also [...]

Read Full Post »

Web Version

Starting to look at what will be needed to make a web accessible version of the database live. Ruby on Rails looks promising – OO scripting language, automagic linking to databases and building webpages. What’s not to like?
Downsides – looks like it needs new-style MySQL password hashing – it looks like the [...]

Read Full Post »

Reloaded!

Concatenated all the loading commands into a script. The reloading went off without a hitch last night, so here’s crossing my fingers hoping that first order related articles will work out.
Things to consider doing while I’m in the reloading frame of mind: writing scripts to automatically download newer versions of all the [...]

Read Full Post »

SQL Woes

Currently suffering from SQL limitation setbacks – hitting situation where temporary tables are getting big (then again, we’re in the millions or billions or records…maybe there’s temp file limitations).
On the upside, I’m happy to discover that MySQL text client has tab completion…
Current status – I’ve loaded homologene, and I’ve crafted queries for related articles. [...]

Read Full Post »

Biting the Bullet

A bit anticlimatic — I had previously waited forever for Entrez Gene and all the GeneRIFs to convert from XML to to text files. So rather than doing that myself, I wrote the code to load from text files on the Entrez Gene FTP server. Have other people do most of my [...]

Read Full Post »

Optimise

Sounds like a transformers yell…”Transformers, Optimise!”
Anyways, thing to fix: all the CREATE TABLE AS SELECT…. create tables without keys. No key is definitely bad. Trying this for the one that really matters – the gene_term_related_citations query. EXPLAIN seems happier, but proof will be in the pudding (or [...]

Read Full Post »

A cunning plan

Time to get cracking on the optimisation. The dual plans are to

Expand the TF database with taxonids (and eventually load all of Entrez Gene)
Optimise queries using table indexes

Looks optimistic – I like orders of magnitude improvements.
Just hacked the Entrez Gene fetcher to extract the taxonid from the XML. I should look into [...]

Read Full Post »

Got back from CGDN,  and the “genes to related articles” table creation querying was still stuck at “copying to tmp table”.   Since that makes it a solid handful of days,  I’m kind of scared as to exactly how big this table will be.  So, in the spirit of not emptying disk partitions, I’ve pulled the [...]

Read Full Post »