View Issue Details

IDProjectCategoryView StatusLast Update
0022028mantisbtperformancepublic2016-12-30 15:54
Reporternicolass Assigned Tocproensa  
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionfixed 
Target Version1.3.5Fixed in Version1.3.5 
Summary0022028: Time out caused by URL processing
Description

Hi. I want to know how I cant limit the "Description" field when I'm triying to report a bug.

This is because in our organization's mantis someone uploaded a WebService return and that blocked the access to the bug, it never charges, when I elevated the parameter max_execution_time in php.ini file to 300 seconds, an error appeared in the note who had the WS return:

Fatal error: Maximum execution time of 300 seconds exceeded in /xxxx/xxx/mantis/core/string_api.php on line 533

I notice that the php process in my server while the page is charging load the proccessor close to 95%

TagsNo tags attached.
Attached Files
nic.txt (308,558 bytes)

Activities

atrol

atrol

2016-12-09 10:51

developer   ~0054720

This might be caused by an old PHP version.
Which one do you use?

Which version of MantisBT do you use?

nicolass

nicolass

2016-12-09 11:20

reporter   ~0054721

Thanks for your answer:

We have mantis in production with:

  • 5.3.26 (Yes, very outdated)
  • Mantis 1.2.8

In our test environment we have:

  • 5.6.26
  • Mantis 1.3.4

The "Fatal Error" is replicated in both environments but I could get the error in the test environment, in production the page never charges.

BTW Sorry for my bad english, is not my native language

atrol

atrol

2016-12-09 17:52

developer   ~0054723

Could you upload the description of the affected issue as an attachment?

nicolass

nicolass

2016-12-12 08:01

reporter   ~0054735

Sorry but I dont understand. Do you want the WS return that someone uploaded as a description?

atrol

atrol

2016-12-12 08:11

developer   ~0054736

Do you want the WS return that someone uploaded as a description?

Right,
but I don't want it as part of a description or note, as it might produce fatal errors.
I would like to have it as an attached file.

nicolass

nicolass

2016-12-12 10:28

reporter   ~0054739

I just uploaded it

atrol

atrol

2016-12-12 10:57

developer   ~0054740

Can you confirm that you don't get any longer the timeout if you deactivate "URL Processing" in "MantisBT Formatting 1.3.0" plugin?

nicolass

nicolass

2016-12-12 12:50

reporter   ~0054742

Yes I deactivated "URL Processing" and the bug charges immediately. This on the version 1.3.4. I don't have access to admin privileges on production mantis, so, those are the same steps for the version 1.2.8?

Thanks

cproensa

cproensa

2016-12-12 18:14

developer   ~0054743

Last edited: 2016-12-12 18:15

I reproduced this, and fixed by adding a length limitation to the email regex.

<pre>
before
/[a-zA-Z0-9.!#$%&'+\/=?^_`{|}~-]+@a-zA-Z0-9?(?:.a-zA-Z0-9?)/"

after
/[a-zA-Z0-9.!#$%&'+\/=?^_`{|}~-]{1,64}@a-zA-Z0-9?(?:.a-zA-Z0-9?)/
</pre>

notice change + into {1,64}
I'm not sure about the 64, there must be some rfc that says what is the limit in the left part of @. The point here is to set a limit at some point to stop the regex. The right part probably also have a maximum, but for me it's harder to apply on this expression.

@dregad, you have more experience with these f--- regexps. any suggestion?

atrol

atrol

2016-12-13 02:02

developer   ~0054747

those are the same steps for the version 1.2.8?
Right

Keep in mind: This is just a workaround as you deactivate some functionality with it,
e.g. something like http://www.mantisbt.org will be no longer a link, but just pure text.

dregad

dregad

2016-12-13 05:13

developer   ~0054750

@cproensa, for the record the e-mail matching Regex we're currently using is taken (almost) verbatim from the HTML5 specification [1].

It is itself derived from the addr-spec specification, as defined in RFC 5322 (Internet message format) [2], which does not define a maximum size for the "local part" of the address (hence the W3C's choice to use '+' in the regex).

That being said, RFC 5321 (SMTP) does define size limits [3] but they are recommendations not strict rules.

I'm not sure about the 64, there must be some rfc that says what is the limit in the left part of @.

The local part SHOULD be <= 64 bytes, so your proposed fix matches the spec's recommendation. I think it's perfectly acceptable to modify the regex as you suggest.

the right part probably also have a maximum

Maximum length for domain SHOULD be 255 bytes.

but for me it's harder to apply on this expression.

I believe implementing that would require a rather complex regex (using positive lookahead), and TBH I'm not really sure it's worth the effort.

[1] https://github.com/mantisbt/mantisbt/blob/master/core/email_api.php
[2] https://tools.ietf.org/html/rfc5322#section-3.4.1
[3] https://tools.ietf.org/html/rfc5321#section-4.5.3.1

cproensa

cproensa

2016-12-13 12:26

developer   ~0054755

Last edited: 2016-12-13 12:27

The local part SHOULD be <= 64 bytes, so your proposed fix matches the spec's recommendation. I think it's perfectly acceptable to modify the regex as you suggest.

By limiting the local part, it will match a substring if the string is longer.
Example, suppose it's limited to 5:
string: "abcdefgh@xxx"
matched: "defgh@xxx"

This may not be desirable, i'd say the ideal is to not match that
However, it's better than what is doing now?

dregad

dregad

2016-12-14 08:43

developer   ~0054759

Last edited: 2016-12-14 08:44

By limiting the local part, it will match a substring if the string is longer.
This may not be desirable, i'd say the ideal is to not match that

True, but to avoid that we'd need to add a "lookbehind" group to the regex, to make sure the previous char is not a valid word char, something like this:

pre>(?&lt;!([a-zA-Z0-9.!#$%&amp;'*+\/=?^_`{|}~-]))[a-zA-Z0-9.!#$%&amp;'*+\/=?^_`{|}~-]{1,64}@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*</pre

Test string (with regex modified to limit 5 per your earlier example):
<pre>
defgh@xxx abcdefgh@xxx asldlskjdflksjdabc defgh@xxx
^ match ^no match ^match
</pre>

Of course the regex is a lot less efficient like this (about 50% more steps required / 30% slower to complete the match), so using the attached bugnote as a test case we get:

  • without lookbehind (regex with 64 chars limit proposed in 0022028:0054743): 2 matches, 614975 steps (892ms)
  • with lookbehind (above regex): 2 matches, 922599 steps (1.15s)

(Tested with https://regex101.com/)

However, it's better than what is doing now?

In any case it's better for sure , since it prevents the regex from timing out / out of memory...

Now the question is whether the performance tradeoff is truly necessary, considering that the size limit we already have in the domain name matching group already causes a similar behavior: 'abcdefgh@<insert 61 chars here>xxx' to match up to and not including the 'xxx'.

cproensa

cproensa

2016-12-14 18:16

developer   ~0054762

Last edited: 2016-12-14 18:19

Thanks for the link. I have learned a lot of these ? syntax...

So, should be this the correct one? (redacted as logical groups)

lookbehind for a non-local char:
(?<![a-zA-Z0-9.!#$%&'+\/=?^_`{|}~-])
lookahead for max of 254 local/domain chars:
(?![@a-zA-Z0-9.!#$%&'
+\/=?^`{|}~-]{254})
local part, max of 64 valid chars:
[a-zA-Z0-9.!#$%&'*+\/=?^
`{|}~-]{1,64}
@
domain parts (whatever it is...):
a-zA-Z0-9?(?:.a-zA-Z0-9?)*
lookahead, not a domain char
(?![a-zA-Z0-9])

With the same test case, it returns 992738 steps, 2 matches

Full regex:
(?<![a-zA-Z0-9.!#$%&'+\/=?^_`{|}~-])(?![@a-zA-Z0-9.!#$%&'+\/=?^`{|}~-]{254})[a-zA-Z0-9.!#$%&'*+\/=?^`{|}~-]+@a-zA-Z0-9?(?:.a-zA-Z0-9?)*(?![a-zA-Z0-9])

dregad

dregad

2016-12-15 09:50

developer   ~0054774

Thanks for the link. I have learned a lot of these ? syntax...

Yes, definitely the best regex resource I've found, both for developing, debugging and educational purposes. Did you try the debugger and unit tests tools ?

lookahead for max of 254 local/domain chars:
(?![@a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]{254})

I don't have time to look in detail atm, but I'm not sure what you're trying to achieve with that bit. In particular, where is the 254 limit coming from ? IIRC there's a 255 limit on the domain, not on the whole address.

All regex fun & games aside, let me ask again: is the performance hit caused by a complex regex to avoid a "false match" truly necessary ?

cproensa

cproensa

2016-12-15 10:51

developer   ~0054777

but I'm not sure what you're trying to achieve with that bit. In particular, where is the 254 limit coming from ? IIRC there's a 255 limit on the domain, not on the whole address.

basically, limiting the match group length. When a valid char appears, and there are more than MAX valid chars ahead, this should fail.
254 number from:
here: https://tools.ietf.org/html/rfc5321#section-4.5.3.1.3
and here: https://www.rfc-editor.org/errata_search.php?rfc=3696&amp;eid=1690

is the performance hit caused by a complex regex to avoid a "false match" truly necessary ?

With long "rubbish" text that may cause false matches due some appearances of "@", having long strings formatted as email links is not pretty.

so, truly necessary? no, as shorter matches within the allowed limits would still appear as false positives.
But performance-wise, these checks seem not to add steps over your first look-behind addition.

Do you know a way to check for the 254 lookahead and shortcut a fail, or discard that part right away? probably that would be lighter.

cproensa

cproensa

2016-12-16 13:04

developer   ~0054788

PR: https://github.com/mantisbt/mantisbt/pull/978
stick with the simplest solution.

Related Changesets

MantisBT: master-1.3.x 31c7ecdd

2016-12-16 07:59

cproensa

Committer: dregad


Details Diff
Add limits to email regex to avoid timeouts on very long texts

Adds a limit of 64 chars on the local part of the email address, as
defined by: https://tools.ietf.org/html/rfc5321#section-4.5.3.1

This avoids timeout errors on very long texts with false matches.

Fixes: 0022028
Affected Issues
0022028
mod - core/email_api.php Diff File