Database Optimization

This page describes some database optimisations for SQL Server 2000. These are based on Mantis 1.0.2

The Problem

The project I was working on used a Mantis installation based on Windows 2003 Server/IIS/PHP and SQL Server. The filters were starting to become unusably slow, particularly the view_all_bug_page. Pressing the “Reset Filter” button could produce a delay of 40-50 seconds before a page appeared. This problem was getting worse as we added more and more issues.

The Investigation

The investigation of the problem centered on the filter code in core/filter_api.php. The function filter_get_bug_rows() is the core of the bug filtering system.

The existing filter_get_bug_rows() function is 813 lines of code - arguably a candidate for some “refactoring with extreme prejudice” however that is beyond the scope of what I was trying to do here, which was to make MANTIS work acceptably under SQL server with more than 1000 open issues.

What is interesting is the area after line 724 in the original file. What happens here is that a list of relevant issues is created in the variable $t_id_array. This is then converted into a where clause at line 756. This produces a SQL statement which will be used to query mantis_bug_table of the form: -

SELECT ... FROM mantis_bug_table JOIN ...... 
  WHERE 
  mantis_bug_table.id IN ( 1, 10, 11, ...... long list OF bug ids ... )

The key observation that I made was that SQL Server hates long in (…) lists. When your project has several thousand open issues, this in (…) clause will contain several thousand numbers. The result, in SQL Server at least, is misery.

The Solution

Given that the problem was caused by the excessively long in() clause, my solution was to convert the in(…) clause to a temporary table, so that the SQL shown above becomes…

SELECT ... FROM mantis_bug_table JOIN ...... 
 #filtered_ids fid ON fid.id = mantis_bug_table.id

Where the #filtered_ids table contains the list of bug_ids that were previously in the in(…) clause. This temporary table id also indexed and clustered on the id field so that we can squeeze the last drop of performance out of the table index.

The full text of my solution is shown below: -

Firstly replace the code from line 724 to 764 with the following : -

		# Possibly do two passes. First time, grab the IDs of issues that match the filters. Second time, grab the IDs of issues that
		# have bugnotes that match the text search if necessary.

		#
		# Using a long list of bug id's for subsequent select statements does not seem to work well in SQL server
		# so.... we will select the list of bug id's into a temp table
		#
		db_query("if exists (select * from tempdb..sysobjects where id = object_id(N'tempdb..#filtered_ids') )".
		         "drop table [dbo].[#filtered_ids]" );
		db_query( "create table #filtered_ids ( fid int not null)" );
		db_query( "create clustered index ix_filtered_ids on #filtered_ids(fid)" );
 
		$t_id_array = array();
 
		$t_id_where = $t_where;
		$t_id_join = $t_join;
		if ( !is_blank( $t_id_where ) && !is_blank( $t_textsearch_where_clause ) ) {
				$t_id_where = $t_id_where . ' AND ' . $t_textsearch_where_clause;
		}
		$query  = "insert into #filtered_ids SELECT DISTINCT $t_bug_table.id as id
						$t_from
						$t_id_join
						$t_id_where";
		db_query( $query );
 
		$t_id_where = $t_where;
		$t_id_join = $t_join;
		if ( !is_blank( $t_textsearch_wherejoin_clause ) ) {
			$t_id_where = $t_id_where . ' AND ' . $t_textsearch_wherejoin_clause;
			$t_id_join = $t_id_join . " INNER JOIN $t_bugnote_table ON $t_bugnote_table.bug_id = $t_bug_table.id";
			$t_id_join = $t_id_join . " INNER JOIN $t_bugnote_text_table ON $t_bugnote_text_table.id = $t_bugnote_table.bugnote_text_id";
			$query  = "insert into #filtered_ids SELECT DISTINCT $t_bug_table.id as id
						$t_from
						$t_id_join
						$t_id_where";
			db_query( $query );
		}
 
		$query  = "SELECT count(DISTINCT(fid)) as nrows from #filtered_ids";
		$result = db_query( $query );
		$row = db_fetch_array( $result);
		$row_count = $row['nrows'];
 
		$t_from = 'FROM ' . $t_bug_table;
 
		# Get the total number of bugs that meet the criteria.
		$bug_count = $row_count; #count( $t_id_array );

Then at line 853 change the SELECT statement to :-

	$t_order = " ORDER BY " . implode( ', ', $t_order_array );
	$t_select	= implode( ', ', array_unique( $t_select_clauses ) );
 
        # Modified to optimize it for SQL Server
        # Join to the temp table of ids that we have created.
        $t_join .= " JOIN #filtered_ids on #filtered_ids.fid = $t_bug_table.id ";
	$query2  = "select DISTINCT $t_select
				$t_from
				$t_join
				$t_order";
 
	# Figure out the offset into the db query
	#

Alternative solution by RZG

NOTE BY RZG: My IMHO simpler solution was to use subqueries. With just 2 small changes, we were able to fix the issues described on my company Mantis, which has well over 10000 issues on it. See Issue 7516.

How much difference does it make?

I will admit that I haven't made any exact measurements of the difference that this optimization makes with out current load of over 6400 open issues.

What I do know is that when we had about 1000 open issues, the view_all_bug_page was regularly taking over 40s to appear. (Sometimes the SQL caching would work, and it would appear straight away, other times, when a newly added bug changed the text of that in(…) clause the SQL would change, and everything would slow down again…) To add to that the time taken to display the view_all_bug_page seemed to be increasing in a non-linear fashion. The result for the project would have been catastrophic as the number of issues continued to rise.

In addition to the above SQL optimisation, there were some significant benefits to be gained from adding cacheing for custom field values that we have displayed in the view_all_bugs_page.

We now have over 6400 open issues. The view_all_bugs_page never takes more than 5 seconds to appear. Pages other than the first page such as accessed by clicking on the “Next” link can be accessed in under 3 seconds.

Future developments

This modification has a number of limitations, particularly in that it is very much focused on SQL Server syntax. Ideally we would have an architecture in place in filter_api.php that allowed different database backends to use their own optimised search methods. This would probably require the implementation of architectural changes to MANTIS to allow for the use of variations in the query strategy with different SQL dialects.

For SQL server, I think we would want to look at using a stored procedure to return the required dataset for a particular page in view_all_bugs_page.

Table of Contents