WikiProject Chemicals and WikiProject Pharmacology are validating the content in the infoboxes ((chembox)) and ((drugbox)). Values in the infobox are compared with values reported in literature, and when the values match, the revision is stored in the index for chembox and the index for drugbox, respectively. This is typically done for values that are 'immutable' (e.g., the boiling point of a chemical compound: the boiling point of water under standard conditions is 99.98 °C, and there is no plausible reason to suspect it will change).
We are verifying the CAS Registry number (|CASNo= in ((chembox)), |CAS_number= in ((drugbox))), ChemSpiderID (ChemSpiderID), Unique Ingredient Identifier (UNII), InChI, KEGG, and ChEMBL by comparison with the data on CAS website, ChemSpider and FDA'S UNII Search Service as well as from lists supplied by (CAS number, ChemSpiderID, InChI, UNII, ChEMBL and ChEBI) or downloaded from these websites (KEGG, DrugBank). In the meantime, we are trying to add, update and/or check as a number of other identifiers (InChI, InChIKey) by comparison of the data with the ChemSpider website.
CheMoBot is following changes to these articles, and is set up to update the infoboxes. When it detects changes to values, it will change parameters in the infobox accordingly. These parameters are used by the template to show what the status of the fields are in the box.
If you encounter a page with a ((chembox)) or ((drugbox)) that shows an N, then please check if the value is wrong (in which case, it can just be changed back to the value in the verified revision; the bot will do the rest), or if there is a mistake in the verified revision (if so, it may need an update of the index; if you need help with that, please ask the appropriate wikiproject).
Verification – tagging references
CheMoBot adds a template to a _Ref parameter (e.g. for CASNo, CASNo_Ref will be filled with ((cascite|correct|XXX))) when the bot finds the field correct. The first parameter of the template is 'correct', or 'changed', and the box will show a tick or a cross accordingly on CASNo. The second parameter is a field that contains a reference for 'where' the parameter was verified. As we are at the moment verifying all fields against the CAS commonchemistry.org site, the bot replaces XXX with 'CAS' (i.e., ((cascite|correct|CAS))). When using another place to verify the CASNo, please adapt this parameter accordingly and will try to retain this field throughout. When there will be significantly more verifications against non-commonchemistry.org-places, I will instruct the bot to fill the field standard with ((cascite|correct|??)) or something similar.
Method of work
Our approach is to start by checking that the CAS registry number and the structure match with the name. This will be used as a foundation upon which we can build a broader validation effort. Once we have the structure verified, we have the formula, and hence the molar mass, and we can also generate other machine representations such as SMILES, InChI and InChIKey.
First 1000
After our IRC meeting on January 13, 2009, we used an Excel file to validate the first 1000 entries from the CAS XML file. This is available to project members here, on the password-protected site. Meanwhile, User:Physchim62 validated the inorganics separately, and these can be found in the CAVer file.
The work
We are now beginning to work through the list of "problem articles" found by User:Beetstra, and listed at User:Beetstra/CASFoundCorrect. A description of the process will be added soon.
Notes
Different CAS Registry Numbers[1] are used for each form of a substance. For example, something simple like alanine will have one CAS# for the D form, another for L, another for "unspecified" and a fourth one for racemic. There would be another four CAS#s for the hydrochloride, four for the (1:1) sulfate, four for the (2:1)sulfate, etc. It is very important that we match the correct form CAS# to our Chemboxes!
Be aware that CAS uses an unusual system for representing some formulae, which may seem "wrong" to us. These involve describing salts such as sodium nitrate as HNO3·Na, and organic salts follow a similar system. Do not use such formulae on WP, but they are not "wrong" since they are merely a representation, not a formal structure. This also results in incorrect MolarMass in the FW section of the SDF file for salts.
For complex chiral structures, such as bleomycin, which may be drawn very differently in WP than in Common Chemistry, I found it best to assign R/S for each center and compare that way. (And yes, Farseer drew bleomycin perfectly!)
The CAS No. (that is, "CAS Registry Number"[1]) in a Chembox will receive a green tick (check mark) once ((cascite)) is added. This does not happen yet in the Drugbox (there is no change at present), but we hope to enable a similar system there too, if WP:PHARM is in agreement.
Fields to check/upload
Chemboxes
Check structure, CAS no., Formula, MolarMass.
Notes:
1. the bot 'divides' the fields in two sets, watched and unwatched; all changes are reported, but the watched fields are the ones we really want to take care of, those are the fields that contain hardcore, verifiable data that are very unlikely to change (as the boiling point of water, the CAS-number of benzene, the number of carbons in glucose. N.B. the list of 'watched' fields may need to be updated
2. The bot regards an empty field as 'unknown'. It will report changes to this field, but will assign a lower 'warning level' to it.
3. Things between <!-- and --> are 'comments', they can be saved and appear in the editbox, but do not produce visible wikicode.
When a 'better' version of a page comes up, change the number on the page. If there are two revids for the same page, it uses the one closest to the bottom of the index-page (the page gets parsed top to bottom, replacing values if duplicates occur).
Linoleic_acid WP says cis, cis, CAS says trans trans 'linoelaidic acid', the whole world says linoleic acid is 60-33-3 including the spreadsheet and sigma.
This is very strange, it is trans,trans in the union file and cis,cis in the wikichem file (I have been using the union file to verify CAS numbers). I need to look into this. Ambix (talk) 12:47, 12 February 2009 (UTC)[reply]
Glucose 1-phosphate One chiral center is not specified (should be up to match CAS). (probably a result of copying glucose skeleton, in which this atom is not chiral?).
Cholecalciferol: The structure diagram has one carbon atom with two wedge bonds attached, making verification difficult (the stereochemistry should be R here, and I think it is)
Vitamin B12: The structure diagram does not adequately specify the stereochemistry of the Corrin ring
Asparagine: The structure diagram has one carbon atom with two wedge bonds attached, making verification difficult (the stereochemistry should be S here, and is)
Histidine: Structure needs to show stereochemistry
Veratridine: still to be verified, the structure displays badly in ChemFileBrowser
Sodium lactate: old-style chembox; note that CASRN is for unspecified stereochemistry
Valine: The structure diagram has one carbon atom with two wedge bonds attached, making verification difficult (the stereochemistry should be S here, and is)
Threonine: The structure diagram does not specify the stereochemistry at the two chiral centres (should be 2S,3R)
Endrin: The structure diagram appears to show the endo-isomer whereas the CASRN is for the exo-isomer (or vice versa, I never was very good at this particular bit of nomenclature! in any case, it's not the same compound!) We should recheck with Dieldrin (CASRN [60-57-1]) as well. Neither compound has the stereochemistry correctly specified.
I've rechecked Dieldrin, adding the implicit hydrogens to the WP structure and drawing in chemsketch, I also copied the CAS structure exactly and had the program assign stereo labels. They match, which leads me to think my initial verify is OK. It maybe should be noted that while the carbon skeletons look to be the same projection, WP is from above and CAS (turns out to be) from below. If you are still unhappy could you describe your assignment in more detail? I'll try the chemsketch method with Endrin and hopefully we can compare notes Ambix (talk) 23:27, 6 February 2009 (UTC)[reply]
I have checked Endrin with the same process and it does not match. There is an older version of this image Endrin.png and this does match. Given the difficulties of transposing a 3D structure to more conventional form it would probably be better to have a more conventional structure as well for compounds like this but I would suggest we avoid removing 3D structures providing it is possible to validate them. I will investigate further.
I suggest that for our validated structure on such compounds, we should explicitly show the stereochemistry of each chiral centre, which is not the case at present on Endrin and Dieldrin (even if a knowledgeable chemist can figure out what it must be from the diagram). That doesn't necessarily mean changing the structures in the chemboxes (our images for inorganics don't always give a clear idea of the structure), but we should insist on the chembox information being correct and not-misleading, and that the full details be available in the article (maybe in a separate image). Physchim62(talk)23:23, 9 February 2009 (UTC)[reply]
Trimethylaluminium is dimer, CAS is monomer. Is this significant, will CAS have a dimer listed?
Camphor Both the WP page and the CAS are for unspecified stereoisomers however if we follow the naturally occurring rule, should the WP page be changed for the natural isomer and the unspecified CAS be relegated to an 'other'?
2,3-Dimethylbutane redirects to Dimethylbutane only 2,2 has an article. Generic name is now DAB, both 2,2 and 2,3 have articles.
701–800
801–900
901–1000
Inorganics
The 677 "inorganics" (neutral compounds without C–C or C–H bonds) have now all been checked. 496 entries gave a perfect match, 74 entries had some sort of problem in the article (often minor and already fixed) and 100 entries had no appropriate corresponding article on Wikipedia. A full report will be available in due course.
Elements and ions
These will require special treatment: please contact Physchim62 for more details.