Small error on the Tangut Homophones 𗙏𘙰 Lookup Tool[edit]
Hi - I wasn't really sure where the best place to contact you was (as I don't have Twitter or your email address), but I've noticed a small error on your Tangut Homophones 𗙏𘙰 Lookup tool and just wanted to flag it to you.
Currently, Tongyin A 04B48 is associated with 𗄱 (U+17131; LFW 2354). However, page 78 of L2/14-209R actually noted that 𗄲 (U+17132; no LFW ref) is the correct form, and recommended the separate encoding of that character on that basis. On the other hand, the edition of Tongyin B displayed seems to give 𗄹 (U+17139; LFW 1947), which is also noted in L2/14-209R.
It's pretty clear that they're all just variants of the same character, but I thought it was worth flagging this up with you given that this specific point was discussed during the encoding process. Theknightwho (talk) 13:54, 27 April 2022 (UTC)[reply]
Hi, many thanks for the error report. The tool was originally designed before Tangut was included in Unicode, so it is based on LFW numbers internally. When Tangut was added to Unicode, I hacked the code to add support for searching by Unicode characters, but I didn't do it as thoroughly as I should have. Anyway, I have spent a couple of hours this evening improving the tool, so that it shows the correct Unicode character for each edition of Tongyin. If you try it now, it should work correctly for Tongyin A 04B48, LFW 2354, and 𗄲. Please let me know if you find any more issues. BabelStone (talk) 18:36, 27 April 2022 (UTC)[reply]
I have found one single discrepancy (that it may be a bit faffy to fix), which is that 𗴯 (U+17D2F; LFW 5999) is given for both Tongyin A 26B65 and Tongyin B 29B16. When it comes to Tongyin B, that only applies to edition B2, whereas editions B4 and B5 have 𗴰 (U+17D30; LFW 37). The new tool gives 𗴯 at the head of each edition, and is therefore incorrect for those two.
I'm not sure if that's the only discrepancy that exists between the various copies of B, but it's the only one I've found. Theknightwho (talk) 18:51, 27 April 2022 (UTC)[reply]
Ah yes, that was a case I looked at today, but was too lazy to do anything about, so thank you for nudging me to do the right thing here. The only other similar case (with different encoded character variants for different versions of the B edition) that I have documented is Tongyin B 29B12 which has 𘂧 and 𘂤 (the latter is the correct form used almost without exception in printed and manuscript texts). I have now released a new version of the tool where I special-case these two particular examples. Additionally there are nearly twenty examples where the glyph form of one or more B versions does not exactly correspond to the encoded character, which I will leave alone for now (in the future they could be represented using IVS). Please let me know if you notice any more errors or discepancies. BabelStone (talk) 21:04, 27 April 2022 (UTC)[reply]
@BabelStone Thanks - that's really helpful (particularly given how difficult the dense Tangut characters can be to parse - I don't think I'd have spotted that).
The only other thing that occurs to me (though not an issue with the lookup tool) is that perhaps 𗭹 should be disunified (Wenhai L83.251 and L84.232. Only Han Xiaomang has noted that it appears twice, but my impression is that while both mean "drum", only L84.232 can also mean thunderclap (though I say that very tentatively, as it's very likely that I've misunderstood something somewhere). Tangut does seem to have a habit of relying on minute differences in character forms, though, which is why it feels plausible. Theknightwho (talk) 21:53, 27 April 2022 (UTC)[reply]
I certainly agree that minute differences in character forms can be crucial when it comes to Tangut, and we should certainly pay attention to subtle nuances. However, in this particular case I have not yet found any evidence that it is anything other than a mistaken duplication in Wenhai of essentially the same dictionary entry. The character forms appear to be identical, and the definition in both entries is identical except for the presence/omission of a genitive particle, and both entries include the definition 𗘶𘛻𗙏𗧊(𗗙)𗧘𘃞 "it means the sound produced by a thunderclap". BabelStone (talk) 23:12, 27 April 2022 (UTC)[reply]