View Issue Details

IDProjectCategoryView StatusLast Update
0025701Plugin - EmailReportingGeneralpublic2019-04-25 17:50
ReporterDaveid Assigned ToSL-Gundam  
PrioritynormalSeverityminorReproducibilityhave not tried
Status resolvedResolutionfixed 
Summary0025701: Slow IMAP parsing
Description

The plugin has become very slow, taking 0000060:0000120 seconds to run. I have narrowed it down to the parsing the response of the FETCH imap calls

I see it makes 739 FETCH calls, the first of which seems to be for a message for March 13th, it starts as follows

* 1 FETCH (RFC822.SIZE 1050717 UID 10334 FLAGS (\Seen \Deleted) ENVELOPE ("Wed, 13 Mar 2019 11:19:56 +0000" 

the last is from an email from today.

90% of the run time seems to be parsing these 739 responses, i have observed as such with logging also.

I get the feeling that it's not supposed to get getting 739 messages every time and in fact these are supposed to be marked somewhere as done?

Not sure when this started happening but any help is appreciated :)

mantis is version 2.19.0, EmailReporting is github master, email server is office365

have attached some trace / profiling info

TagsNo tags attached.
Attached Files
output2.svg (110,386 bytes)   
output2.svg (110,386 bytes)   
slowparsing.png (24,984 bytes)   
slowparsing.png (24,984 bytes)   
output.481.svg (118,168 bytes)   
output.481.svg (118,168 bytes)   

Activities

Daveid

Daveid

2019-04-17 14:56

reporter   ~0061954

0000060:0000120 was supposed to be approx 120 seconds ( ~120s)

SL-Gundam

SL-Gundam

2019-04-17 15:08

reporter   ~0061955

Last edited: 2019-04-17 15:10

Well the issue is that for every message in a folder EmailReprting needs to retrieve the status flags and verify to see if the message needs to be processed (this includes messages marked as deleted)

So if you've turned off http://www.mantisbt.org/wiki/doku.php/mantisbt:plugins:emailreporting#mail_delete then this can add up quite a bit
The communication with IMAP is slow, the actual PHP processing time is fast

Now it happens that recently with had an issue: https://mantisbt.org/forums/viewtopic.php?f=13&t=26100
The fix for that was that we needed to cache the status flags of the emails in IMAP folders
https://github.com/mantisbt-plugins/EmailReporting/commit/01eb16e74720d485a3dc7ecea3830e404d2ae56b

This fix resulted in a significant performance improvement concerning IMAP communication
https://mantisbt.org/forums/viewtopic.php?p=67708#p67708

So my advise is update to 0.11.0-DEV (https://github.com/mantisbt-plugins/EmailReporting) and see whether thats also true for your situation

SL-Gundam

SL-Gundam

2019-04-17 15:13

reporter   ~0061956

If possible please also provide the same graphs for the new version so that we can see how much performance has improved and if there are other areas that might need looking at

Daveid

Daveid

2019-04-17 16:20

reporter   ~0061957

thanks for the speedy reply! sorry to disappoint but I updated to master after initially finding it was slow with imap, i notice that caching was enabled so hope that it might help.
Also I do actually think the parsing is slow, the CPU is sat at 100% and i added logging into the parsing function to confirm

I added error_log($str); at this line
https://github.com/mantisbt-plugins/EmailReporting/blob/master/core_pear/Net/IMAPProtocol.php#L3438
and added error_log('done'); at the bottom of the function
https://github.com/mantisbt-plugins/EmailReporting/blob/master/core_pear/Net/IMAPProtocol.php#L3544

when running i see the many responses fly past quickly with the response immediately followed by done. when the 700 odd FETCH messages fly passed the done message takes 114 seconds to appear after the response, and the cpu is 100% the entire time

SL-Gundam

SL-Gundam

2019-04-17 18:05

reporter   ~0061958

I'm sorry to say i don't have any test situation where it takes that long.
Was the runtime the same before the update? or has it improved with the caching but is still bad?

The function in question is part of a PHP PEAR package called Net_IMAP EmailReporting uses. Its hasn't had any official updates since 2014
https://pear.php.net/package/Net_IMAP
https://github.com/pear/Net_IMAP

I've only applied changes to that package when compatibility required it.
Since i don't maintain the Net_IMAP code i would have to debug it on a situation where the problem is present

I still have plans to start using https://www.php.net/manual/en/book.imap.php
But that will not be soon at all.

Is it possible you provide me the content of the folder so that i might be able to reproduce the issue? Otherwise you will have to dig in to this yourself and find out what must be done to improve performance
Keep in mind that EmailReporting needs to know the state of an email to decide to process it or not. So that will need to be checked for all emails in a folder.
Also EmailReporting doesn't keep a local cache between runs like a normal mail client would.

The function that is at the source of this is EmailReporting's getListing function
Try changing 'IMAP' on this line to anything else like 'IMAP1' -> https://github.com/mantisbt-plugins/EmailReporting/blob/master/core/mail_api.php#L541
This way it will fallback to the order Net_IMAP or the mail server returns the emails in. So EmailReporting will not process the oldest to the newest email necessarily.
I'm curious what the performance difference will be

SL-Gundam

SL-Gundam

2019-04-17 18:07

reporter   ~0061959

Maybe if this fixes it we will need to perform the sorting of the emails after processing the flags...

SL-Gundam

SL-Gundam

2019-04-17 19:28

reporter   ~0061960

Actually already found another method to fix this if you confirm the solution

Please let me know if this works for you

0001-Optimized-IMAP-email-sorting.patch (2,997 bytes)   
From b6638cf499e41b6979b86f78975c9a84d66964e8 Mon Sep 17 00:00:00 2001
From: SL-Gundam <slgundam@gmail.com>
Date: Thu, 18 Apr 2019 01:22:55 +0200
Subject: [PATCH] Optimized IMAP email sorting Fix email processing order

---
 core/mail_api.php | 37 +++++++++++--------------------------
 doc/CHANGELOG.txt |  2 ++
 2 files changed, 13 insertions(+), 26 deletions(-)

diff --git a/core/mail_api.php b/core/mail_api.php
index fc531d5..02d2ff0 100644
--- a/core/mail_api.php
+++ b/core/mail_api.php
@@ -346,7 +346,7 @@ class ERP_mailbox_api
 
 						if ( !$this->pear_error( 'Retrieve list of messages', $t_ListMsgs ) )
 						{
-							while ( $t_Msg = array_pop( $t_ListMsgs ) )
+							while ( $t_Msg = array_shift( $t_ListMsgs ) )
 							{
 								$t_emailresult = $this->process_single_email( $t_Msg[ 'msg_id' ] );
 
@@ -445,7 +445,7 @@ class ERP_mailbox_api
 											{
 												$t_flags = $this->_mailserver->getFlags();
 
-												while ( $t_Msg = array_pop( $t_ListMsgs ) )
+												while ( $t_Msg = array_shift( $t_ListMsgs ) )
 												{
 													$t_isDeleted = $this->isDeleted( $t_Msg[ 'msg_id' ], $t_flags );
 
@@ -536,35 +536,20 @@ class ERP_mailbox_api
 	# Needed a workaround to sort IMAP emails in a certain order
 	private function getListing()
 	{
-		$t_ListMsgs = NULL;
+		$t_ListMsgs = $this->_mailserver->getListing();
 
-		if ( $this->_mailbox[ 'mailbox_type' ] === 'IMAP' )
+		if ( !PEAR::isError( $t_ListMsgs ) )
 		{
-			$t_getSummary = $this->_mailserver->getSummary();
-
-			if ( !$this->pear_error( 'IMAP Get folder info', $t_getSummary ) )
+			if ( $this->_mailbox[ 'mailbox_type' ] === 'IMAP' )
 			{
-				$t_nummsg = count( $t_getSummary );
-				$t_ListMsgs = array();
-				for ( $i = 0; $i < $t_nummsg; $i++ )
-				{
-					$t_getSummary[ $i ][ 'DATE' ] = strtotime( $t_getSummary[ $i ][ 'DATE' ] );
-					// If strtotime fails we default back to the current time. 
-					if ( $t_getSummary[ $i ][ 'DATE' ] === FALSE )
-					{
-						$t_getSummary[ $i ][ 'DATE' ] = time();
-					}
-
-					$t_ListMsgs[ $t_getSummary[ $i ][ 'DATE' ] ] = array( 'msg_id' => (int) $t_getSummary[ $i ][ 'MSG_NUM' ] );
-				}
-
-				krsort( $t_ListMsgs );
+				$t_ListMsgs = array_column( $t_ListMsgs, NULL, 'uidl' );
+			}
+			else
+			{
+				$t_ListMsgs = array_column( $t_ListMsgs, NULL, 'msg_id' );
 			}
-		}
 
-		if ( $t_ListMsgs === NULL )
-		{
-			$t_ListMsgs = $this->_mailserver->getListing();
+			ksort( $t_ListMsgs );
 		}
 
 		return( $t_ListMsgs );
diff --git a/doc/CHANGELOG.txt b/doc/CHANGELOG.txt
index b136245..1f38eec 100644
--- a/doc/CHANGELOG.txt
+++ b/doc/CHANGELOG.txt
@@ -12,6 +12,8 @@ Sep 2018 - EmailReporting-0.11.0
 	- Limited max length of message id (#25433)
 	- Add support for Windows-1257 charset
 	- Enable the caching of IMAP flags
+	- Optimized IMAP email sorting
+	- Fix email processing order
 
 May 2018 - EmailReporting-0.10.1
 	- Added error if, on installation, 'api/soap/mc_file_api.php' is missing
-- 
2.21.0.windows.1

Daveid

Daveid

2019-04-18 04:25

reporter   ~0061966

With that patch it's down to 19 seconds total run time, undoing the patch takes it back up to 120ish
The response being parsed is much smaller now, the vast majority of the the data is gone from each item, but there does seem to be two fetches now.

Daveid

Daveid

2019-04-18 04:38

reporter   ~0061967

since you asked IMAP1 (alone) takes the total runtime to approx 19 seconds also

Daveid

Daveid

2019-04-18 04:46

reporter   ~0061968

Comparison of the profiles (incl column is milliseconds, and profiling makes it slower)

slowparsing-2.png (24,984 bytes)   
slowparsing-2.png (24,984 bytes)   
Daveid

Daveid

2019-04-18 04:59

reporter   ~0061969

wrong file

imap_compare.png (138,881 bytes)   
imap_compare.png (138,881 bytes)   
Daveid

Daveid

2019-04-18 05:00

reporter   ~0061970

sorry both the new runs are IMAP1 :) my bad. will get the results with the patch

output.63299.svg (199,104 bytes)   
output.63299.svg (199,104 bytes)   
SL-Gundam

SL-Gundam

2019-04-18 05:32

reporter   ~0061971

Changes have been pushed to master
https://github.com/mantisbt-plugins/EmailReporting/commit/b79122de346f313827bd591ff69ec56c8490d09e

This commit is a little bit different from the patch so update your installation as well

I will be examining the provided trace and profiling info later tonight

Thank you

Daveid

Daveid

2019-04-18 06:29

reporter   ~0061974

I did some poking around
the first second if this if is pretty wasteful, substr is really expensive and should be called as little as possible
i noted that that after the patch BODY does not appear in the basic responses and tried removing the search for BODY. The total runtime is down to 11 seconds.
https://github.com/mantisbt-plugins/EmailReporting/blob/master/core_pear/Net/IMAPProtocol.php#L2797

Obviously BODY is probably needed for the plugin to work however creating ($len - 5)*2 sub strings to find it seems silly

Daveid

Daveid

2019-04-18 06:43

reporter   ~0061975

Last edited: 2019-04-18 07:00

i have a patch that makes this section more efficient
(edit, maybe this patch does not work :sweat:)
(edit 2, i stand by the patch! i just failed to fetch master after resetting)
profile results show getSubstr calls down from 160,000 to 49,000

fewer_substr.patch (1,655 bytes)   
diff --git a/core_pear/Net/IMAPProtocol.php b/core_pear/Net/IMAPProtocol.php
index ced0c75..7791a82 100644
--- a/core_pear/Net/IMAPProtocol.php
+++ b/core_pear/Net/IMAPProtocol.php
@@ -2794,23 +2794,25 @@ class Net_IMAPProtocol
 
         default:
             for ($pos = 0; $pos < $len; $pos++) {
-                if ($this->_getSubstr($str, 0, 5) == 'BODY[' 
-                    || $this->_getSubstr($str, 0, 5) == 'BODY.') {
+                $c = $str[$pos];
+                if ($c =='B' &&
+                      ($this->_getSubstr($str, 0, 5) == 'BODY['
+                       || $this->_getSubstr($str, 0, 5) == 'BODY.')) {
                     if ($str[$pos] == ']') {
                         $pos++;
                         break;
                     }
-                } elseif ($str[$pos] == ' ' 
-                          || $str[$pos] == "\r" 
-                          || $str[$pos] == ')' 
-                          || $str[$pos] == '(' 
-                          || $str[$pos] == "\n" ) {
+                } else if ($c == ' '
+                          || $c == "\r"
+                          || $c == ')'
+                          || $c == '('
+                          || $c == "\n" ) {
                     break;
                 }
-                if ($str[$pos] == "\\" && $str[$pos + 1 ] == ' ') {
+                if ($str[$pos] == '\\' && $str[$pos + 1 ] == ' ') {
                     $pos++;
                 }
-                if ($str[$pos] == "\\" && $str[$pos + 1 ] == "\\") {
+                if ($str[$pos] == '\\' && $str[$pos + 1 ] == '\\') {
                     $pos++;
                 }
             }
fewer_substr.patch (1,655 bytes)   
Daveid

Daveid

2019-04-18 07:05

reporter   ~0061976

nope, the patch is wrong :) but close

Daveid

Daveid

2019-04-18 07:10

reporter   ~0061977

fixed the patch, i though that it was checking whether BODY was at $pos, but in fact it was just if $str started with BODY, so just need to do it once

fewer_substr-2.patch (1,433 bytes)   
diff --git a/core_pear/Net/IMAPProtocol.php b/core_pear/Net/IMAPProtocol.php
index ced0c75..bd42a27 100644
--- a/core_pear/Net/IMAPProtocol.php
+++ b/core_pear/Net/IMAPProtocol.php
@@ -2793,18 +2793,21 @@ class Net_IMAPProtocol
             break;
 
         default:
+            $startsWithBody = $str[0] == 'B'
+              && ($this->_getSubstr($str, 0, 5) == 'BODY['
+                  || $this->_getSubstr($str, 0, 5) == 'BODY.');
             for ($pos = 0; $pos < $len; $pos++) {
-                if ($this->_getSubstr($str, 0, 5) == 'BODY[' 
-                    || $this->_getSubstr($str, 0, 5) == 'BODY.') {
-                    if ($str[$pos] == ']') {
+                $c = $str[$pos];
+                if ($startsWithBody) {
+                    if ($c == ']') {
                         $pos++;
                         break;
                     }
-                } elseif ($str[$pos] == ' ' 
-                          || $str[$pos] == "\r" 
-                          || $str[$pos] == ')' 
-                          || $str[$pos] == '(' 
-                          || $str[$pos] == "\n" ) {
+                } elseif ($c == ' '
+                          || $c == "\r"
+                          || $c == ')'
+                          || $c == '('
+                          || $c == "\n" ) {
                     break;
                 }
                 if ($str[$pos] == "\\" && $str[$pos + 1 ] == ' ') {
fewer_substr-2.patch (1,433 bytes)   
SL-Gundam

SL-Gundam

2019-04-18 17:55

reporter   ~0061979

If you use things like this $str[0] you will also need to add isset( $str[0] ) or !empty( $str ) to avoid a notice level error
substr does not need that

Daveid

Daveid

2019-04-18 18:40

reporter   ~0061980

Can't see it at the moment but I think the string is guaranteed to have some length at that point and since 50% of the run time is in those two substr calls it's worth it.

SL-Gundam

SL-Gundam

2019-04-25 17:50

reporter   ~0061995

Sorry it took me some time but was finally able to verify the code
https://github.com/mantisbt-plugins/EmailReporting/commit/a74e858c2833561144d95eaa1bc74ca6150ec9f4

Thank you.

If you are willing to do some more work for IMAP, i could use your help with adding support for the PHP IMAP extension
https://www.php.net/manual/en/book.imap.php
Initially the plan is to support both PEAR packages and the extension. Users will be able to switch between the two using the EmailReporting configuration options
Though i don't know whether its feasible supporting PEAR and PHP IMAP side by side