View Issue Details

IDProjectCategoryView StatusLast Update
0014744mantisbtbugtrackerpublic2014-12-08 02:07
Reporterphyllisl Assigned Todregad  
PrioritynormalSeveritymajorReproducibilityalways
Status closedResolutionfixed 
Product Version1.3.0dev 
Target Version1.3.0-beta.1 
Summary0014744: Unicode characters in text field prevent bug display
Description

When special unicode chars are present in one of an issue's text fields (Description, Steps To Reproduce, Additional Information) or a bugnote's text, it cannot be displayed anymore in the view issue details page.

The following error is triggered


XML Parsing Error: not well-formed
Location: http://localhost/mantis/dev/view.php?id=11
Line Number 256, Column 22:
NOTICE 0674 THRD: xÆ·xÆ·

-----------------------------------^

Steps To Reproduce

In the view issue details page for a bug:

  • Copy-paste the text below in the "add note" text field

    NOTICE 0674 THRD: xÆ·xÆ·
    NOTICE 0674 THRD: ø@¯ø@
  • Click Add Note button
TagsNo tags attached.

Relationships

related to 0014811 closeddregad Summary page doesn't generate properly because of special characters 
related to 0014157 closedrombert Array to string conversion error on soap request with PHP 5.4 
related to 0015721 closedgrangeway Functionality to consider porting to master-2.0.x 
related to 0014721 closeddregad Error page when generate summary in HTML format 
related to 0011230 closedrombert High-ascii characters in fields will cause invalidity in XML. 

Activities

dregad

dregad

2012-09-26 06:10

developer   ~0032944

It would appear that there are special, non-printing chars in data you provided.

I copy/pasted the data into my editor, and it showed a 'DC3' char before the 'Æ' symbol, and an 'SOH' after the 'ø'. If I save the file and dump it's contents as hex, here's what happens


$ hexdump -C /tmp/foo
00000000 4e 4f 54 49 43 45 20 30 36 37 34 20 54 48 52 44 |NOTICE 0674 THRD|
00000010 3a 20 78 13 c3 86 c2 b7 78 13 c3 86 c2 b7 0a 4e |: x.....x......N|
00000020 4f 54 49 43 45 20 30 36 37 34 20 54 48 52 44 3a |OTICE 0674 THRD:|
00000030 20 c3 b8 01 40 c2 af c3 b8 01 40 0a | ...@.....@.|
0000003c

Note the 'DC3' maps to 0x13 and 'SOH' to 0x01.

1.3 chokes on these due to strict html checking, and I guess these control chars are considered invalid in the DTD.

I'm not really sure how to fix this at the moment; in the meanwhile as a workaround, I suggest you manually edit the bugnote to remove the control chars.

atrol

atrol

2012-09-26 06:35

developer   ~0032945

As a temporary workaround I suggest to change http_api.php line 135
from
header( 'Content-Type: application/xhtml+xml; charset=UTF-8' );
to
header( 'Content-Type: text/html; charset=UTF-8' );

and html_api.php line 321
from
echo "\t", '<meta http-equiv="Content-type" content="application/xhtml+xml; charset=UTF-8" />', "\n";
to
echo "\t", '<meta http-equiv="Content-type" content="text/html; charset=UTF-8" />', "\n";

dregad

dregad

2012-09-26 12:12

developer   ~0032950

Last edited: 2012-09-26 12:34

@atrol - not sure dhx would like your workaround ;)

To get to the bottom of this issue:

  • user's data contains control characters
  • valid "data" characters (i.e. displayable text in this case) as per current version of XML 1.0 specification [1] exclude all control characters except tab, cr and lf

Therefore, I believe that the proper fix would be revise string_html_specialchars() to exclude invalid characters from the display string.

Suggested change https://github.com/dregad/mantisbt/commit/fix-14744

Thoughts ?

[1] http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Char

dregad

dregad

2012-09-26 12:24

developer   ~0032952

Reminder sent to: dhx

dhx, any feedback or suggestions ?

phyllisl

phyllisl

2012-09-26 12:51

reporter   ~0032953

Last edited: 2012-09-26 18:50

EDIT dregad: see separate issue 0014748 for follow up on the note below.


There is another error happening when having a attachment of jpg image. When generating summary report in HTML, it breaks on line 194, col 194:

<td class="print" colspan="5">
    Error#504.jpg (116,726) <span class="italic">2012-09-25 16:33</span><br />http://10.96.99.119/mantisbt/file_download.php?file_id=1&type=bug<br /><img src="file_download.php?file_id=1&type=bug" alt="jpg" /><br />

It outputs fine after I changed the '&' before "type=bug" alt="jpg" />
" to '&'.

dregad

dregad

2012-09-26 13:18

developer   ~0032954

phyllisl, your note 0014744:0032953 has nothing to do with the problem at hand. Can you kindly

  1. open a separate issues for each individual problem, and
  2. trim your screenshots attachments, to avoid unnecessary clutter

Furthermore, your last screenshot is even useless as it does not even show the character where the error occurs. When you have text output, it is generally better to copy-paste the text rather than an image, and include it (within html 'pre' tags if necessary) in your bug report or note.

Thanks for your understanding

dregad

dregad

2012-09-26 13:20

developer   ~0032955

In addition, your feedback on 0014744:0032950 would be appreciated - does that fix your problem ?

atrol

atrol

2012-09-26 13:47

developer   ~0032956

Last edited: 2012-09-26 13:59

@atrol - not sure dhx would like your workaround ;)
Probably he will not like it.
That's why I wrote TEMPORARY WORKAROUND and not something like FIX ;)

Maybe we should TEMPORARY add something like $g_enable_browser_xhtml_validation = ON/OFF

[Edited]
Forget $g_enable_browser_xhtml_validation = ON/OFF
Users will set it to OFF and we will get no longer any reports for issues caused by the validation.

phyllisl

phyllisl

2012-09-26 14:06

reporter   ~0032957

0014744:0032950

This does fix the problem of special characters and the summary page is generated.

Thank you!

dregad

dregad

2012-09-26 18:56

developer   ~0032958

Thanks for your feedback.

I removed the attachment related to note 0014744:0032953 and added it to 0014748

phyllisl

phyllisl

2012-10-09 19:04

reporter   ~0033110

Just found out that if you have special characters in the name of an custom field, or in the name of a project, the summary report would not generate properly because of the special characters.

Apparently, there needs string_html_specialchars() at two other spots in summary_api.php at around line 57 and 394

dregad

dregad

2012-10-10 08:30

developer   ~0033123

Follow up on 0014744:0033110 in new issue 0014811

rombert

rombert

2012-11-14 17:06

reporter   ~0034319

I've applied the same fix to the SOAP API on the master-1.2.x branch to make sure it outputs correct XML and it does. But on the other hand it strips out (some?) non-Latin-1 characters, for instance you will not see anything written in Russian.

I've added a failing test to the SOAP API with https://github.com/mantisbt/mantisbt/commit/88a332a6c980ecf827ac654798e8b5317cf233ab . The same strings fail to be displayed on an installation from master so I also reopen this issue.

I'm not sure if the characters are valid for UTF-8 or not, but I'm sure we need to find a solution for outputting them.

dregad

dregad

2012-11-15 13:39

developer   ~0034328

See 0014157 for a proposed revised regex which (as far as I can tell) does not filter out Russian chars, and hopefully also works for the rest of it ;-)

Feedback appreciated.

dregad

dregad

2012-11-15 17:59

developer   ~0034331

Added commit with new regex based on rombert's feedback in 0014157.

grangeway

grangeway

2013-04-05 17:56

reporter   ~0036139

Marking as 'acknowledged' not resolved/closed to track that change gets ported to master-2.0.x branch

Related Changesets

MantisBT: master 2b5d6621

2012-09-26 05:20

dregad


Details Diff
Remove invalid chars from displayed string per XML specification

Strict XHTML requires that data comply with XML 1.0 specification [1],
which only allows a subset of the UTF-8 charset.

Function string_html_specialchars() has been modified to remove from the
string to print, any character which is not in the defined range

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]

Fixes 0014744

[1] http://www.w3.org/TR/xml/
Affected Issues
0014744
mod - core/string_api.php Diff File

MantisBT: master ff2e6506

2012-11-15 17:32

dregad


Details Diff
Fix regex to remove UTF-8 chars invalid in XML 1.0

The regex introduced in string_html_specialchars() function with commit
2b5d66217bd4ecf5e7271f1a4b2b339d7681e91c caused problems with multibyte
UTF-8 chars, as PCRE require that they are specified like '\x{NNNN}';
the syntax without braces '\xNN' only supports up to 2 hex digits [1].

Fixes 0014744

[1] http://php.net/regexp.reference.escape
Affected Issues
0014744
mod - core/string_api.php Diff File