View Issue Details

IDProjectCategoryView StatusLast Update
0004084mantisbtlocalizationpublic2009-06-23 15:26
Reporterastax Assigned Tosiebrand  
PrioritynormalSeverityfeatureReproducibilityalways
Status closedResolutionfixed 
Target Version1.2.0rc1Fixed in Version1.2.0rc1 
Summary0004084: [all lang] Use UTF-8 codepage
Description

Currently when the system is used in a few languages, this might cause problems due to different charsets for languages with non-latin charset. For example, if I type something in Russian, having Russian locale in Mantis, my input will be saved in Windows-1251 codepage. But when the person who use English locale in Mantis will try to look at my comments, he'll see them in iso-8859-1 or windows-1252 codepage and obviously won't be able to read my text. Even worse the situation when I use English locale and put Russian text. Depending on browser, I'll either won't be able to put Russian characters or they'll be put into field as UTF-8 chars. Both ways make putting the Russian text impossible.

I see the only correct way for fixing this - add an option for global using UTF-8 codepage. Probably it should not be forced for everything, but at least we need an option.

Any comments?

TagsNo tags attached.

Relationships

parent of 0008352 closedgrangeway Upgrading from 1.1.0a3 to 1.1.0a4 generating non correct visualization for 'é, è, à, ù, ..' characters 
parent of 0008230 closedsiebrand Character encoding in Mantis 1.1.0a4 on the Bugtracker site 
parent of 0007472 closedsiebrand [th] Thai chars are broken on Excel export; column header doesn't show 
parent of 0009118 closedsiebrand Bugtracker does not handle UTF-8 formatted text 
has duplicate 0006144 closedachumakov Czech: encoding problem (Iiso <-> utf8) 
has duplicate 0005401 closedjlatour Cyrillic characters encoding problem 
has duplicate 0006226 closedryandesign Invalid encoding of pages 
has duplicate 0007406 closedryandesign Handing of accents not working using UTF8 
related to 0004085 closedsiebrand The system complains about missing required fields 
related to 0004195 closedsiebrand Mails do not show national characters correctly 
related to 0003812 closedsiebrand [jp] Mantis send broken email when lang=Japanese_euc 
related to 0005767 closedgrangeway Win32 MySQL 4.1.12a-nt default character set 
related to 0006536 closed Mantis display a error infomation when create a chinese project 
related to 0007235 closedWanderer strings_czech.txt sets wrong charset 
related to 0007319 closedgrangeway [all lang] Can not display CJK characters 
related to 0006155 closedsiebrand [all lang] Using different language can cause mantis posted issues not to be viewable. 
related to 0006217 closedvboctor [all lang] Wrong fIlename on download 
related to 0006441 closedachumakov I have created an english_utf8 locale 
related to 0006505 closedgrangeway [zh_TW] Chinese_Simplified_UTF8 cannot display correctly in phpMyAdmin 
related to 0007400 closedsiebrand [all lang] When using UTF8 for encoding all reports some fields' contents are incorrectly truncated. 
related to 0004742 closedsiebrand [all lang] Mix languages in messages 
related to 0007433 closedachumakov Setting encoding works only on main page 
related to 0005850 closedsiebrand [CJK] Section titles are garbled under Japanese. 
related to 0005104 closedachumakov [all lang] IE 6.0 and Page encoding ISO-8859-2, and special characters õû 
related to 0007481 closedgrangeway Problem with special caracters 
child of 0004181 closed Features in Mantis 1.1 release 

Activities

astax

astax

2004-07-14 02:34

reporter   ~0006035

I'll try to put a few Russan symbols here (should become UTF-8 metasymbols): проверим, как оно работает.

astax

astax

2004-07-14 02:36

reporter   ~0006036

Doesn't work correctly, as everybody see.

jlatour

jlatour

2004-08-06 11:05

reporter   ~0006710

I think it would be a good idea to convert everything to UTF-8. The problem is converting the old content (and figuring out what codepage that is in).

Any ideas?

astax

astax

2004-08-08 01:06

reporter   ~0006791

Last edited: 2004-08-08 01:14

PHP contains a very good function for converting from/to different charsets, including UTF-8 - iconv() . But sometomes it requires some work to make it working in PHP, as its support is not always included. But I think it's better to use it than nothing. Probably the best is to have this as an option.

Converrting everything to UTF-8 is not a real problem - that's the UTF-8 nice property - even if I just put charset="UTF-8" in english localization file, this won't affect usual 7-bit characters.

The only problem is converting all previous input. But as initially Mantis practically wasn't able to work with many languages simultaneously, that means all 8-bit strings inside are in the same codepage. At least each project has strings only in one codepage. In this case, conversion will be quite simple - just need to run $string = iconv($source_codepage, 'UTF-8', $string) on all strings in the database.

Further - as not all browsers (and mail clients) may support UTF-8, it'll be a very good to add a choice for display codepage (can be handled by ob_iconv_handler) and email codepage into personal settings. So the system will be usable even by those who use Lynx and Pine.

(I have some experience in converting a multilingual web-based system to work with UTF-8 output and I can say that adding initial support for UTF-8 is quite simple task. Just need to carefully check all places where codepage is important - usually it's surprisingly little number of such places)

edited on: 08-08-04 01:14

jlatour

jlatour

2004-08-08 02:52

reporter   ~0006794

Yes, I've been looking into the conversion features and the problem is just figuring out what the old codepage is. I don't think we can take in account different codepages in one database, so I guess we'll just have to convert from the default codepage of the default language? Or alternatively, from the default codepage of the user's language (the user who posted that bug, bugnote, et cetera).

What do you think?

astax

astax

2004-08-08 03:26

reporter   ~0006795

I think it's not reliable to rely on user's preferences. As codepage conversion should be considered as irreversible, it would be better to delegate source codepage selection to somebody more responsible and make this as a controlled process. Moreover, somebody could have already switched to UTF-8 - 0004195. So I suggest putting this somewhere in admin area and make a BIG warning asking to backup database first. Then ask the source codepage (probably for each project) and start conversion. To make it a bit more handy, we can extract a random piece of text with 8 bit characters and convert it as an example, so it can be checked if selected codepage is correct.

I'm not sure how to get a list of supported codepages - currently I have only one idea, but it'll work for Unix'es only - execute iconv -l and parse its output. Probably not a very good solution though...

jlatour

jlatour

2004-08-08 04:05

reporter   ~0006797

astax, with your extensive experience on the subject, do you think you could work on conversion routines? I've been working on the conversion towards gettext and when I do that, I also want to convert the language files to UTF-8 so that would be the time to convert the databases as well.

astax

astax

2004-08-08 08:42

reporter   ~0006802

I would be happy to do this, but sorry, I don't think I'll be able to. I just have no free time for this.

zerogan

zerogan

2005-12-17 11:00

reporter   ~0011802

Reminder sent to: zerogan

zerogan

zerogan

2005-12-18 07:43

reporter   ~0011808

Now I will input Chinese Simplifed Characters.
"大家好!你们看到了吗?"

Can it be displayed correctly?

achumakov

achumakov

2006-10-02 09:51

reporter   ~0013575

jlatour, astax: I've converted all the langfiles to utf8 and made an appropriate patch to config:
http://www.chumakov.ru/mantis/mantis-110a-utf8.zip
The Mantis now looks like http://www.chumakov.ru/images/Mantis_ML.gif

I'll keep this utf-8 sync to CVS HEAD.

All we need is:

  • get rid of mb-string issues if any (string length, case conversion, search etc)
  • create database with utf-8 charset and collation by default
  • make a migration path for the existing installations:
    • convert each and every db string from user-specified encoding to utf-8
    • change DB encoding and collation to utf-8
cbradney

cbradney

2006-10-29 19:30

reporter   ~0013658

Vote 1 for this for 1.1 :) http://bugs.scribus.net/view.php?id=4454

achumakov

achumakov

2006-11-08 08:43

reporter   ~0013700

I have changed the Mantis langfiles to be utf-8 by default.

All current strings-<language>.txt files are utf-8 now, and only them are shown in the language picker by default.

Other encodings preserved in /lang/ directory until somebody helps us with migration routine for existing databases. See lang/langreadme.txt

So, Mantis is now basically utf-8. Please test and report what else to be done :)

lifo2

lifo2

2007-02-06 08:24

reporter   ~0014005

I checked if some languages were still not in UTF-8. Under Unix, you can use the following :
grep s_charset * | grep -v utf-8 | grep -v [0-9].txt

It returns some false result, but it shouldn't miss none utf-8 files. Only two languages haven't been converted :

strings_czech.txt:$s_charset = 'iso-8859-2';
strings_polish.txt:$s_charset = 'iso-8859-2';

I think it would be easy to convert it with vim (:set encoding=iso-8849-2 and :set fileencoding=utf-8 and save) but I can't manage to check if it works as I haven't any font installed supporting iso-8859-2 charset.

astax

astax

2007-03-04 23:08

reporter   ~0014126

Now multi-byte UTF-8 strings are not correctly wrapped in email notifications. For most common two-byte symbols, line length is around 40 characters instead of 80.

ave

ave

2007-03-22 12:53

reporter   ~0014233

I found that the body of an email is cut off when Mantis fails to wrap relationship summary.
I had to modify my php.ini to avoid this.

mbstring.func_overload = 2
mbstring.internal_encoding = UTF-8

Is this intended for using utf-8 with Mantis?

siebrand

siebrand

2009-04-27 16:55

developer   ~0021706

Update planned to 1.2.x

siebrand

siebrand

2009-06-16 17:16

developer   ~0022180

All dependencies are resolved. Yay!