View Issue Details

IDProjectCategoryView StatusLast Update
0027099mantisbthtmlpublic2024-04-08 08:44
Reporterloris.nardo Assigned To 
PrioritynormalSeveritymajorReproducibilityhave not tried
Status confirmedResolutionopen 
Product Version2.24.1 
Target Version2.27.0 
Summary0027099: Inconsistent escape of html entities while editing an issue
Description

If the description of an issue contains raw html entities, they are transformed back into the equivalent characters during the edit of the issue.
This can cause the description message to change its meaning especially when the replaced html entities are the less than character and the ampersand character.

Steps To Reproduce
  1. Create an issue with the attached description
  2. See the resulting issue
  3. Open the edit view for the issue
  4. Note that the description text area has different content
  5. Save the issue
  6. Note that the description of the issue has changed, also note that the less than entity has been eagerly transformed into the less than character.

This assume a default configuration for MantisBT

TagsNo tags attached.
Attached Files
description_sample.txt (25 bytes)   
<b>test</b>
description_sample.txt (25 bytes)   

Relationships

has duplicate 0008540 closeddregad & not escaped. 

Activities

hotzeplotz

hotzeplotz

2024-03-27 20:43

reporter   ~0068733

related to: 0008540

hotzeplotz

hotzeplotz

2024-03-28 06:36

reporter   ~0068736

Last edited: 2024-03-28 06:38

The content for the textarea (for editing) is processed in the same way as text for the output (for displaying).

string_api::string_textarea() calls string_api::string_html_specialchars()

return preg_replace( '/&(#[0-9]+|[a-z]+);/i', '&$1;', @htmlspecialchars( $p_string, ENT_COMPAT, 'utf-8' ) );

^ this code snippet is the perfect sample for this unwanted behaviour.

I think that both have a different scope. The text to be edited in the textarea should not be treated in any way (except for htmlspecialchars of course)

string_api::string_textarea() should simple

return @htmlspecialchars( $p_string, ENT_COMPAT, 'utf-8') );

The same applies to text fields. See 0008540.

But unlike string_textarea, which is clearly dedicated, the function string_attribute is not. It is used for the text that is displayed and for the text that is used in text fields.

Like string_textarea, it could be called string_text_field or something similar. One method for displaying and one for editing.

Or one for all string_to_be_used_in_a_form().

Of course, all this based on very less knowledge of all the MantisBT internals. The basic idea is something like: "No mixing of different purposes".

dregad

dregad

2024-03-28 08:46

developer   ~0068737

One method for displaying and one for editing.

@hotzeplotz I agree with your analysis in general. I have not looked at the specifics of actual string_api functions.

More than related, it looks like 0008540 is the same issue, no ?

hotzeplotz

hotzeplotz

2024-03-28 09:52

reporter   ~0068738

More than related, it looks like 0008540 is the same issue, no ?

Yes, it's a duplicate.

hotzeplotz

hotzeplotz

2024-03-28 10:39

reporter   ~0068739

We have already functions that looks right for the purpose of displaying something.

/**
 * Prepare a multiple line string for display to HTML
 * @param string $p_string String to be processed.
 * @return string
 */
function string_display( $p_string ) {
    return event_signal( 'EVENT_DISPLAY_TEXT', $p_string, true );
}

/**
 * Prepare a single line string for display to HTML
 * @param string $p_string String to be processed.
 * @return string
 */
function string_display_line( $p_string ) {
    return event_signal( 'EVENT_DISPLAY_TEXT', $p_string, false );
}

// some more "string_display*"

EVENT_DISPLAY_TEXT is handled by our well known plugin :-)

It somehow sounds logical that everything that is to be displayed runs through our plugin and processed with the - soon to be available - HTMLPurifier.

hotzeplotz

hotzeplotz

2024-03-28 11:54

reporter   ~0068740

I think a quick fix is possible, at least for the textareas. PHPstorm found 12 occurrences of string_textarea, which are all undoubtedly used in a textarea.

eg

<textarea class="form-control" <?php echo helper_get_tab_index() ?> id="description" name="description" cols="80" rows="10" required>
<?php echo string_textarea( $f_description ) ?>
</textarea>

If this is wanted, i would open a PR

Just change string_api::string_textarea

from

function string_textarea( $p_string ) {
    return string_html_specialchars( $p_string );
}

to

function string_textarea( $p_string ) {
    return @htmlspecialchars( $p_string, ENT_COMPAT, 'utf-8' );
}

Would also add a Test for it in test/Mantis/StringTest.php

hotzeplotz

hotzeplotz

2024-03-28 11:59

reporter   ~0068741

looking at the code block above … could cry. I can hardly wait until the markdown PR is finally merged. :-)

dregad

dregad

2024-03-28 14:33

developer   ~0068742

Just change string_api::string_textarea from
return string_html_specialchars( $p_string );
@htmlspecialchars( $p_string, ENT_COMPAT, 'utf-8' );

I looked at the history for string_html_specialchars() function.

  1. the use of @ and the comment about Eastern European languages is 17 years old (see MantisBT master 6c16a0d1 and was meant to circumvent a bug (0003675) causing display of charset not supported. At the time Mantis was not using UTF-8. I'm pretty sure this code is no longer useful.

  2. The preg_replace() bit is pretty ancient too (0008174) and it may be worth investigating whether that's still useful / meaningful today, considering that this predates the MantisCoreFormatting plugin

Based on outcome of 2, the correct fix could be to get rid of string_html_specialchars() completely...

hotzeplotz

hotzeplotz

2024-03-29 23:22

reporter   ~0068753

Last edited: 2024-03-30 14:04

Weird. No idea. string_html_specialchars is used directly and indirectly very often. Personally, i had no bad experiences with just use htmlspecialchars($string), without any additional arguments. So a solution could be keep it as the abstraction and change the return value to just htmlspecialchars($p_string).

Two things could happen.

1.) A silent improvment, nobody gets notice of
2.) Something that has been rely on a broken thing for years becomes even more broken.

I can't make a valid advice here.

You also copied it, not that long ago … for a reason i guess. https://github.com/mantisbt/mantisbt/pull/1511/commits/73a493215f29ea319fa4899bc25cc2b9fe61bc60

string_textarea and string_attribute are proxies and especially string_attribute is used very often.

  • string_textarea would be safe to change.
  • string_attribute is the most time used as the name says as a value of an attribute input value="string_atttibute()" but also as the swiss army knife for "escapeing" oder "sanitizing" something to display.

A very quick "review" reveals interesting usages.

https://github.com/mantisbt/mantisbt/blob/master/core/classes/IssueAttachmentTimelineEvent.class.php#L76

A property … Is it user input?

string_html_specialchars( $this->filename )

https://github.com/mantisbt/mantisbt/blob/master/manage_config_revert.php#L78

Both are the same

string_html_specialchars( implode( ', ', $t_revert_vars ) ),
string_attribute( project_get_name( $f_project_id ) )

https://github.com/mantisbt/mantisbt/blob/master/bug_view_inc.php#L1154

seriously? :-) strict types and a type hint "string_attribute(string $p_string)" would help.

$t_bug_id = string_attribute( $p_bug->id );

https://github.com/mantisbt/mantisbt/blob/master/account_sponsor_page.php#L179

Any user input here? Here it is used to display the status, not used for an attribute value. Theoretically string_html_specialchars should be used.

$t_status = string_attribute( get_enum_element( 'status', $t_bug->status, auth_get_current_user_id(), $t_bug->project_id ) );
echo '<td><a title="' . $t_resolution . '"><span class="underline">' . $t_status . '</span> </a></td>';

Vice versa, some of them, where theoretically string_attribute should be used.

<input type="hidden" name="ref" value="<?php echo string_html_specialchars( $f_ref ) ?>" />

I am sure much more wrong/mixed/weird usages are buried somewhere. Makeing this consist is not an easy task.

I still think that everything that should be displayed should go through string_display and everything used in a form should go through … something other.