In cs/research/integrator
validation for the 1st degree “predictions”: get-gene-disease/validate/
validation for the 2nd degree predictions: cmp-profile/p-profile/validate/
Extract gene features from Entrez: mapviewer/
Archive for June, 2008
Recent Directory activity
Posted in Uncategorized, tagged directories on June 30, 2008 | Leave a Comment »
Region to “Gene” features
Posted in Uncategorized, tagged Entrez on June 30, 2008 | Leave a Comment »
Looks like we can call the map viewer to extract what we want (and probably download it to get what we need if we need to checkpoint it…)
URLs constructed like so, where you replace chromosome_number, range_start and range_end with the appropriate values:
http://www.ncbi.nlm.nih.gov/projects/mapview/maps.cgi?TAXID=9606&CHR=chromosome_number&MAPS=genes[range_start%3Arange_end]&CMD=TXT
or to get a plain text file output:
http://www.ncbi.nlm.nih.gov/projects/mapview/map_downld.cgi?taxid=9606&map=genes&chr=chromosome_number&from=range_start&to=range_end
More about this can be found in [...]
Searching the Genome…
Posted in Entrez on June 30, 2008 | Leave a Comment »
How does one go looking up the features near a chromosomal location? At first pass, this seems pretty trivial – grab a genome browser of choice, then go look. But if I want to get the humans out of the loop, and do this automagically, it’s not so trivial.
First pass is to ask Google – [...]
ROC curves et al…
Posted in Uncategorized on June 23, 2008 | Leave a Comment »
Starting to plot the usual suspects…a little worried initially since the results were…displeasing, but turned out that ROCR expects that low values = bad, whereas we’re in the p-value land where small = good, large negative = good. Getting results in the AUC 60+change range, which is okay, but not overwhelming
Going to back-port this to [...]
Term Prediction
Posted in Uncategorized on June 20, 2008 | Leave a Comment »
Looking at the OMIM validation set, it seems that I didn’t expand the validation set with the term parents (although that might “inflate” the results?) Anyways, just about done this – should back port some of this to the 1st order predictions once I get ROC curves generated.
File comparison
Posted in Uncategorized on June 19, 2008 | Leave a Comment »
And the minute you give up hope and are about to write your own software, you find what you need. Bash command
comm
will take 2 sorted files and print 3 column output, for each line
line is in file 1, line is in file 2, line is in neither
So if our gene-disease files are sorted on the [...]
Validation Set Building
Posted in Uncategorized on June 18, 2008 | Leave a Comment »
Input: file containing geneID-term for validated gene-disease associations in OMIM
Output: file of all geneID-term associations for the gene-disease associations + parent terms, with TRUE/FALSE depending on whether this is supported by the input
Plan is
1) Join the Input file with mesh-child file to to get input gene-disease and parent term file – gene-term-child validation file
2) Sort [...]
Validating the comparison results
Posted in Uncategorized on June 17, 2008 | Leave a Comment »
Starting to notice that I’m losing track of what is in which directory. I’m going to try leaving breadcrumbs on this blog
Working in cmp-profile (comparing gene profiles to disease profiles)
cmp-profile/p-profile (using p-values rather than normalised document counts)
Currently have the nr-genes vs all Disease terms (all human genes are running on the cluster)
In the cmp-profile/p-profile/validate directory, [...]