Saturday, November 29, 2008

Re: e-gold and e-go1d

On Nov 29, 2008, at 9:18 AM, James A. Donald wrote:
> The algorithm is to map all lookalike glyphs to
> canonical glyphs

The definition of lookalike glyphs depends on the choice of font and
variant, and Unicode wraps the whole problem in a lovely layer of
hell. If I had to do this, I'd investigate rendering both strings in
the (same) target font and then quantifying the amount of overlap in
the bitmaps, as e.g. SWORD does for TLDs:

<http://icann.sword-group.com/icann-algorithm/Default.aspx>

The above is proprietary; NIST's Paul Black has Python code available
for a slightly enhanced Levenshtein distance:

<http://hissa.nist.gov/~black/GTLD/>

--
Ivan Krstić <krstic@solarsail.hcs.harvard.edu> | http://radian.org

---------------------------------------------------------------------
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to majordomo@metzdowd.com

0 comments: