View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0004084 | mantisbt | localization | public | 2004-07-14 02:31 | 2009-06-23 15:26 |
Reporter | astax | Assigned To | siebrand | ||
Priority | normal | Severity | feature | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Target Version | 1.2.0rc1 | Fixed in Version | 1.2.0rc1 | ||
Summary | 0004084: [all lang] Use UTF-8 codepage | ||||
Description | Currently when the system is used in a few languages, this might cause problems due to different charsets for languages with non-latin charset. For example, if I type something in Russian, having Russian locale in Mantis, my input will be saved in Windows-1251 codepage. But when the person who use English locale in Mantis will try to look at my comments, he'll see them in iso-8859-1 or windows-1252 codepage and obviously won't be able to read my text. Even worse the situation when I use English locale and put Russian text. Depending on browser, I'll either won't be able to put Russian characters or they'll be put into field as UTF-8 chars. Both ways make putting the Russian text impossible. I see the only correct way for fixing this - add an option for global using UTF-8 codepage. Probably it should not be forced for everything, but at least we need an option. Any comments? | ||||
Tags | No tags attached. | ||||
parent of | 0008352 | closed | grangeway | Upgrading from 1.1.0a3 to 1.1.0a4 generating non correct visualization for 'é, è, à, ù, ..' characters |
parent of | 0008230 | closed | siebrand | Character encoding in Mantis 1.1.0a4 on the Bugtracker site |
parent of | 0007472 | closed | siebrand | [th] Thai chars are broken on Excel export; column header doesn't show |
parent of | 0009118 | closed | siebrand | Bugtracker does not handle UTF-8 formatted text |
has duplicate | 0006144 | closed | achumakov | Czech: encoding problem (Iiso <-> utf8) |
has duplicate | 0005401 | closed | jlatour | Cyrillic characters encoding problem |
has duplicate | 0006226 | closed | ryandesign | Invalid encoding of pages |
has duplicate | 0007406 | closed | ryandesign | Handing of accents not working using UTF8 |
related to | 0004085 | closed | siebrand | The system complains about missing required fields |
related to | 0004195 | closed | siebrand | Mails do not show national characters correctly |
related to | 0003812 | closed | siebrand | [jp] Mantis send broken email when lang=Japanese_euc |
related to | 0005767 | closed | grangeway | Win32 MySQL 4.1.12a-nt default character set |
related to | 0006536 | closed | Mantis display a error infomation when create a chinese project | |
related to | 0007235 | closed | strings_czech.txt sets wrong charset | |
related to | 0007319 | closed | grangeway | [all lang] Can not display CJK characters |
related to | 0006155 | closed | siebrand | [all lang] Using different language can cause mantis posted issues not to be viewable. |
related to | 0006217 | closed | vboctor | [all lang] Wrong fIlename on download |
related to | 0006441 | closed | achumakov | I have created an english_utf8 locale |
related to | 0006505 | closed | grangeway | [zh_TW] Chinese_Simplified_UTF8 cannot display correctly in phpMyAdmin |
related to | 0007400 | closed | siebrand | [all lang] When using UTF8 for encoding all reports some fields' contents are incorrectly truncated. |
related to | 0004742 | closed | siebrand | [all lang] Mix languages in messages |
related to | 0007433 | closed | achumakov | Setting encoding works only on main page |
related to | 0005850 | closed | siebrand | [CJK] Section titles are garbled under Japanese. |
related to | 0005104 | closed | achumakov | [all lang] IE 6.0 and Page encoding ISO-8859-2, and special characters õû |
related to | 0007481 | closed | grangeway | Problem with special caracters |
child of | 0004181 | closed | Features in Mantis 1.1 release |
I'll try to put a few Russan symbols here (should become UTF-8 metasymbols): проверим, как оно работает. |
|
Doesn't work correctly, as everybody see. |
|
I think it would be a good idea to convert everything to UTF-8. The problem is converting the old content (and figuring out what codepage that is in). Any ideas? |
|
PHP contains a very good function for converting from/to different charsets, including UTF-8 - iconv() . But sometomes it requires some work to make it working in PHP, as its support is not always included. But I think it's better to use it than nothing. Probably the best is to have this as an option. Converrting everything to UTF-8 is not a real problem - that's the UTF-8 nice property - even if I just put charset="UTF-8" in english localization file, this won't affect usual 7-bit characters. The only problem is converting all previous input. But as initially Mantis practically wasn't able to work with many languages simultaneously, that means all 8-bit strings inside are in the same codepage. At least each project has strings only in one codepage. In this case, conversion will be quite simple - just need to run $string = iconv($source_codepage, 'UTF-8', $string) on all strings in the database. Further - as not all browsers (and mail clients) may support UTF-8, it'll be a very good to add a choice for display codepage (can be handled by ob_iconv_handler) and email codepage into personal settings. So the system will be usable even by those who use Lynx and Pine. (I have some experience in converting a multilingual web-based system to work with UTF-8 output and I can say that adding initial support for UTF-8 is quite simple task. Just need to carefully check all places where codepage is important - usually it's surprisingly little number of such places) edited on: 08-08-04 01:14 |
|
Yes, I've been looking into the conversion features and the problem is just figuring out what the old codepage is. I don't think we can take in account different codepages in one database, so I guess we'll just have to convert from the default codepage of the default language? Or alternatively, from the default codepage of the user's language (the user who posted that bug, bugnote, et cetera). What do you think? |
|
I think it's not reliable to rely on user's preferences. As codepage conversion should be considered as irreversible, it would be better to delegate source codepage selection to somebody more responsible and make this as a controlled process. Moreover, somebody could have already switched to UTF-8 - 0004195. So I suggest putting this somewhere in admin area and make a BIG warning asking to backup database first. Then ask the source codepage (probably for each project) and start conversion. To make it a bit more handy, we can extract a random piece of text with 8 bit characters and convert it as an example, so it can be checked if selected codepage is correct. I'm not sure how to get a list of supported codepages - currently I have only one idea, but it'll work for Unix'es only - execute |
|
astax, with your extensive experience on the subject, do you think you could work on conversion routines? I've been working on the conversion towards gettext and when I do that, I also want to convert the language files to UTF-8 so that would be the time to convert the databases as well. |
|
I would be happy to do this, but sorry, I don't think I'll be able to. I just have no free time for this. |
|
Reminder sent to: zerogan |
|
Now I will input Chinese Simplifed Characters. Can it be displayed correctly? |
|
jlatour, astax: I've converted all the langfiles to utf8 and made an appropriate patch to config: I'll keep this utf-8 sync to CVS HEAD. All we need is:
|
|
Vote 1 for this for 1.1 :) http://bugs.scribus.net/view.php?id=4454 |
|
I have changed the Mantis langfiles to be utf-8 by default. All current strings-<language>.txt files are utf-8 now, and only them are shown in the language picker by default. Other encodings preserved in /lang/ directory until somebody helps us with migration routine for existing databases. See lang/langreadme.txt So, Mantis is now basically utf-8. Please test and report what else to be done :) |
|
I checked if some languages were still not in UTF-8. Under Unix, you can use the following : It returns some false result, but it shouldn't miss none utf-8 files. Only two languages haven't been converted : strings_czech.txt:$s_charset = 'iso-8859-2'; I think it would be easy to convert it with vim (:set encoding=iso-8849-2 and :set fileencoding=utf-8 and save) but I can't manage to check if it works as I haven't any font installed supporting iso-8859-2 charset. |
|
Now multi-byte UTF-8 strings are not correctly wrapped in email notifications. For most common two-byte symbols, line length is around 40 characters instead of 80. |
|
I found that the body of an email is cut off when Mantis fails to wrap relationship summary. mbstring.func_overload = 2 Is this intended for using utf-8 with Mantis? |
|
Update planned to 1.2.x |
|
All dependencies are resolved. Yay! |
|