Solution for authors too big
- Use only one score, and only keep the “highest k” – DO NOT save it all
- To IMPLEMENT: modify the profile comparison code to store only the top k lines
Need to overlap pharm list with the drugbank list to make sure we’re not losing too many
- Messed up something here it seems – only 302 of the drugbank generic names map to chem terms (ignoring case)
- Actually – only 397 of the chem terms are mapping from chem-mesh-refs.txt
- and only 827 of all-chem-refs.txt is mapping
- CAS number matching 794 records
- SIGH…looks like ~/drugcards-sorted is not properly sorted. BOO
- OK NEW STATS
- all-chem-refs matches 2666 of the drugcards
- 1029 of pharma-chem matches the drugcards
- 1731 are in all-chem but not in pharma-chem
- 94 are in pharma-chem but not in all-chem…WAITASEC WHAT??
- DOH – have to be careful on joins regarding whitespace – use the -t param to specify the break field (ie NO BREAK FIELD)
- 910 in all-chem match drugcards
- 803 in pharma-chem match drugcards
- surprisingly – 27 pharma-chem are NOT in all-chem?? Isn’t phamra-chem a subset of all-chem? or is all-chem-refs something else entirely vs pharma-chem-refs?
- INTERESTING – pharma is NOT a subset of all-chem..although in the pipeline it is used as a filter so doesn’t quite matter. weird though.
Advertisement