View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0014744 | mantisbt | bugtracker | public | 2012-09-26 04:31 | 2014-12-08 02:07 |
Reporter | phyllisl | Assigned To | dregad | ||
Priority | normal | Severity | major | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Product Version | 1.3.0dev | ||||
Target Version | 1.3.0-beta.1 | ||||
Summary | 0014744: Unicode characters in text field prevent bug display | ||||
Description | When special unicode chars are present in one of an issue's text fields (Description, Steps To Reproduce, Additional Information) or a bugnote's text, it cannot be displayed anymore in the view issue details page. The following error is triggered <pre> | ||||
Steps To Reproduce | In the view issue details page for a bug:
| ||||
Tags | No tags attached. | ||||
related to | 0014811 | closed | dregad | Summary page doesn't generate properly because of special characters |
related to | 0014157 | closed | rombert | Array to string conversion error on soap request with PHP 5.4 |
related to | 0015721 | closed | grangeway | Functionality to consider porting to master-2.0.x |
related to | 0014721 | closed | dregad | Error page when generate summary in HTML format |
related to | 0011230 | closed | rombert | High-ascii characters in fields will cause invalidity in XML. |
It would appear that there are special, non-printing chars in data you provided. I copy/pasted the data into my editor, and it showed a 'DC3' char before the 'Æ' symbol, and an 'SOH' after the 'ø'. If I save the file and dump it's contents as hex, here's what happens Note the 'DC3' maps to 0x13 and 'SOH' to 0x01. 1.3 chokes on these due to strict html checking, and I guess these control chars are considered invalid in the DTD. I'm not really sure how to fix this at the moment; in the meanwhile as a workaround, I suggest you manually edit the bugnote to remove the control chars. |
|
As a temporary workaround I suggest to change http_api.php line 135 and html_api.php line 321 |
|
@atrol - not sure dhx would like your workaround ;) To get to the bottom of this issue:
Therefore, I believe that the proper fix would be revise string_html_specialchars() to exclude invalid characters from the display string. Suggested change https://github.com/dregad/mantisbt/commit/fix-14744 Thoughts ? |
|
Reminder sent to: dhx dhx, any feedback or suggestions ? |
|
EDIT dregad: see separate issue 0014748 for follow up on the note below. There is another error happening when having a attachment of jpg image. When generating summary report in HTML, it breaks on line 194, col 194:
It outputs fine after I changed the '&' before "type=bug" alt="jpg" /><br />" to '&'. |
|
phyllisl, your note 0014744:0032953 has nothing to do with the problem at hand. Can you kindly
Furthermore, your last screenshot is even useless as it does not even show the character where the error occurs. When you have text output, it is generally better to copy-paste the text rather than an image, and include it (within html 'pre' tags if necessary) in your bug report or note. Thanks for your understanding |
|
In addition, your feedback on 0014744:0032950 would be appreciated - does that fix your problem ? |
|
Maybe we should TEMPORARY add something like $g_enable_browser_xhtml_validation = ON/OFF [Edited] |
|
0014744:0032950 This does fix the problem of special characters and the summary page is generated. Thank you! |
|
Thanks for your feedback. I removed the attachment related to note 0014744:0032953 and added it to 0014748 |
|
Just found out that if you have special characters in the name of an custom field, or in the name of a project, the summary report would not generate properly because of the special characters. Apparently, there needs string_html_specialchars() at two other spots in summary_api.php at around line 57 and 394 |
|
Follow up on 0014744:0033110 in new issue 0014811 |
|
I've applied the same fix to the SOAP API on the master-1.2.x branch to make sure it outputs correct XML and it does. But on the other hand it strips out (some?) non-Latin-1 characters, for instance you will not see anything written in Russian. I've added a failing test to the SOAP API with https://github.com/mantisbt/mantisbt/commit/88a332a6c980ecf827ac654798e8b5317cf233ab . The same strings fail to be displayed on an installation from master so I also reopen this issue. I'm not sure if the characters are valid for UTF-8 or not, but I'm sure we need to find a solution for outputting them. |
|
See 0014157 for a proposed revised regex which (as far as I can tell) does not filter out Russian chars, and hopefully also works for the rest of it ;-) Feedback appreciated. |
|
Added commit with new regex based on rombert's feedback in 0014157. |
|
Marking as 'acknowledged' not resolved/closed to track that change gets ported to master-2.0.x branch |
|
MantisBT: master 2b5d6621 2012-09-26 05:20 Details Diff |
Remove invalid chars from displayed string per XML specification Strict XHTML requires that data comply with XML 1.0 specification [1], which only allows a subset of the UTF-8 charset. Function string_html_specialchars() has been modified to remove from the string to print, any character which is not in the defined range Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] Fixes 0014744 [1] http://www.w3.org/TR/xml/ |
Affected Issues 0014744 |
|
mod - core/string_api.php | Diff File | ||
MantisBT: master ff2e6506 2012-11-15 17:32 Details Diff |
Fix regex to remove UTF-8 chars invalid in XML 1.0 The regex introduced in string_html_specialchars() function with commit 2b5d66217bd4ecf5e7271f1a4b2b339d7681e91c caused problems with multibyte UTF-8 chars, as PCRE require that they are specified like '\x{NNNN}'; the syntax without braces '\xNN' only supports up to 2 hex digits [1]. Fixes 0014744 [1] http://php.net/regexp.reference.escape |
Affected Issues 0014744 |
|
mod - core/string_api.php | Diff File |