Hello,
thanks for the great app and especially it's Unicode support.
I've started to write an import script for japanese site (EUC-JP encoding, CP:20932) and I've noticed very strange thing.
Raw page saved automatically to file 'page.html' is OK. But when I process the page by the script (even when I output the HTML string right at the script start) it becomes broken at some places. For example:
<a href="/digital/videoa/-/detail/=/cid=41djk012/">猥らなほどに悩ましい 古都ひかる</a></p>
becomes
<a href="/digital/videoa/-/detail/=/cid=41djk012/">猥らなほどに悩ましい 古都ひか・E/a></p>
Which besides of changing the text, destroys the whole HTML structure.
Other examples out of many more:
~ -> ?
奥さん! -> ・E気鵝・
女 2 -> ・E2
For many hours I've been trying different codepages of the script (20932, autodetect, UTF8), but garbled text or errors like this one are the only results.
Has anyone had similar experience? Can't there be a bug in the script parser? Is there any way around?
Thank you very much in advance.