View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0031837 | mantisbt | bugtracker | public | 2022-12-19 06:57 | 2022-12-20 08:52 |
Reporter | ricardoalonsos | Assigned To | |||
Priority | normal | Severity | feature | Reproducibility | always |
Status | acknowledged | Resolution | open | ||
Summary | 0031837: Using disk to store bug/project files for large environments will have thousand of files into the same folder | ||||
Description | If the database is very large (>200 projects, >300000 issues), storing files in the DB is expensive, so using the disk is a nice option. But there's no simple way to move from DB to disk, already organizing per project (will require directly database update to comply). Or projects with a massive amount of attached files will also have a single folder with possibly thousands of files. Dividing the files into folders is the best solution. And because the file names as already randomly set, it will be easy to hash them into folders, using the first 2 or 3 letters of their names. I already implemented this, but it may need further testing or improvement how the files are hashed into the folders. The patches I used are attached. | ||||
Tags | patch | ||||
Attached Files | 0001-changed-file-structure-creating-folders-to-avoid-too.patch (3,523 bytes)
From f79953d38e0032ea4e0e6112a5558547ac6cd332 Mon Sep 17 00:00:00 2001 From: Ricardo Alonso <ralonso@redhat.com> Date: Thu, 24 Nov 2022 16:44:15 +0000 Subject: [PATCH] changed file structure, creating folders to avoid too many files into a single folder --- .gitignore | 1 + admin/move_attachments.php | 12 +++++- config/config_inc.php.sample | 83 ------------------------------------ core/file_api.php | 24 ++++++++++- 4 files changed, 34 insertions(+), 86 deletions(-) create mode 100644 .gitignore delete mode 100644 config/config_inc.php.sample diff --git a/admin/move_attachments.php b/admin/move_attachments.php index 682da75..df34fb3 100644 --- a/admin/move_attachments.php +++ b/admin/move_attachments.php @@ -194,7 +194,9 @@ function move_attachments_to_disk( $p_type, array $p_projects ) { $t_data = array(); while( $t_row = db_fetch_array( $t_result ) ) { - $t_disk_filename = $t_upload_path . $t_row['diskfile']; + # first check if filename is on new format already + $t_filepath = adjust_filepath($t_upload_path, $t_row['diskfile']); + $t_disk_filename = $t_filepath . $t_row['diskfile']; if ( file_exists( $t_disk_filename ) ) { $t_status = 'Disk File Already Exists \'' . $t_disk_filename . '\''; $t_failures++; @@ -217,7 +219,7 @@ function move_attachments_to_disk( $p_type, array $p_projects ) { } $t_update_result = db_query( $t_update_query, - array( $t_upload_path, $t_row['id'] ) + array( $t_filepath, $t_row['id'] ) ); if( !$t_update_result ) { @@ -242,6 +244,9 @@ function move_attachments_to_disk( $p_type, array $p_projects ) { $t_file['bug_id'] = $t_row['bug_id']; } $t_data[] = $t_file; + + $t_row = null; + gc_collect_cycles(); } } @@ -253,6 +258,9 @@ function move_attachments_to_disk( $p_type, array $p_projects ) { 'data' => $t_data, ); + $t_result = null; + gc_collect_cycles(); + } return $t_moved; } diff --git a/core/file_api.php b/core/file_api.php index 9ccaa70..e344965 100644 --- a/core/file_api.php +++ b/core/file_api.php @@ -934,6 +934,9 @@ function file_add( $p_bug_id, array $p_file, $p_table = 'bug', $p_title = '', $p $t_unique_name = file_generate_unique_name( $t_file_path ); $t_method = config_get( 'file_upload_method' ); + # adjust the path to accomodate the files into smaller folders + $t_file_path = adjust_filepath( $t_file_path, $t_unique_name); + switch( $t_method ) { case DISK: file_ensure_valid_upload_path( $t_file_path ); @@ -1409,4 +1412,23 @@ function file_get_content_type_override( $p_filename ) { */ function file_get_max_file_size() { return (int)min( ini_get_number( 'upload_max_filesize' ), ini_get_number( 'post_max_size' ), config_get( 'max_file_size' ) ); -} \ No newline at end of file +} + +/** + * Adjust the file path to subdivide the uploaded files into folders. + * + * @param string $p_filepath the path to store the files + * @param string $p_filename the name of the file to store + * @return string the adjusted file path if necessary. + * + */ +function adjust_filepath($p_filepath, $p_filename){ + $t_search = DIRECTORY_SEPARATOR . substr( $p_filename, 0, 2 ); + if (strpos( $p_filepath, $t_search) === false ){ + $t_filepath = $p_filepath . substr( $p_filename, 0, 2 ) . DIRECTORY_SEPARATOR; + if ( !file_exists( $t_filepath ) ) + mkdir( $t_filepath, 0700 ); + return $t_filepath; + } + return $p_filepath; +} -- 2.38.1 0003-fixing-upload-download-with-new-folder-structure.patch (2,635 bytes)
From 1a07a9b95f1c6de3145063ecfe2218bd41e05739 Mon Sep 17 00:00:00 2001 From: Ricardo Alonso <ricardo.alonso@niit.com> Date: Wed, 14 Dec 2022 09:50:46 +0200 Subject: [PATCH] fixing upload/download with new folder structure --- core/bug_api.php | 2 +- core/file_api.php | 3 ++- file_download.php | 3 ++- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/core/bug_api.php b/core/bug_api.php index 9cc5711..2f2be56 100644 --- a/core/bug_api.php +++ b/core/bug_api.php @@ -1763,7 +1763,7 @@ function bug_get_bugnote_stats( $p_bug_id ) { */ function bug_get_attachments( $p_bug_id ) { db_param_push(); - $t_query = 'SELECT id, title, diskfile, filename, filesize, file_type, date_added, user_id, bugnote_id + $t_query = 'SELECT id, title, concat(folder, diskfile) as diskfile, filename, filesize, file_type, date_added, user_id, bugnote_id FROM {bug_file} WHERE bug_id=' . db_param() . ' ORDER BY date_added'; diff --git a/core/file_api.php b/core/file_api.php index e344965..c25c42e 100644 --- a/core/file_api.php +++ b/core/file_api.php @@ -671,6 +671,7 @@ function file_delete( $p_file_id, $p_table = 'bug', $p_bugnote_id = 0 ) { $c_file_id = (int)$p_file_id; $t_filename = file_get_field( $p_file_id, 'filename', $p_table ); + $t_folder = file_get_field( $p_file_id, 'folder', $p_table ); $t_diskfile = file_get_field( $p_file_id, 'diskfile', $p_table ); if( $p_table == 'bug' ) { @@ -681,7 +682,7 @@ function file_delete( $p_file_id, $p_table = 'bug', $p_bugnote_id = 0 ) { } if( DISK == $t_upload_method ) { - $t_local_disk_file = file_normalize_attachment_path( $t_diskfile, $t_project_id ); + $t_local_disk_file = file_normalize_attachment_path( $t_folder . $t_diskfile, $t_project_id ); if( file_exists( $t_local_disk_file ) ) { file_delete_local( $t_local_disk_file ); } diff --git a/file_download.php b/file_download.php index 005fe4d..87fa5f3 100644 --- a/file_download.php +++ b/file_download.php @@ -102,6 +102,7 @@ if( false === $t_row ) { /** * @var int $v_bug_id * @var int $v_project_id + * @var string $v_folder * @var string $v_diskfile * @var string $v_filename * @var int $v_filesize @@ -177,7 +178,7 @@ $t_file_info_type = false; switch( $t_upload_method ) { case DISK: - $t_local_disk_file = file_normalize_attachment_path( $v_diskfile, $t_project_id ); + $t_local_disk_file = file_normalize_attachment_path( $v_folder . $v_diskfile, $t_project_id ); if( file_exists( $t_local_disk_file ) ) { $t_file_info_type = file_get_mime_type( $t_local_disk_file ); } -- 2.38.1 | ||||
Just checking, are you aware that you can already define a distinct directory to store attachments, for each individual project ? This may be sufficient to reduce the number of files in the directory down to an acceptable level. See Upload File Path in manage_proj_edit_page.php: by default it's blank, i.e the project is using the globally defined directory ($g_absolute_path_default_upload_folder).
There's an admin script to do just that (admin/move_attachments_page.php), but admittedly it's quite basic and probably a bit outdated too as it does not see much usage. That being said, if the projects' attachment paths are already set, I believe it will store the files in the configured directories. Out of curiosity, how many attachments are we talking about here ? This is the first time that I hear about this being a problem. And what would be a maximum acceptable number of files in a given directory ? I am asking, because I wonder if a simple approach like the one you propose, i.e. using the first 2 or 3 letters of [the attachment file] names will only delay the problem and may not be enough to actually limit the number of files to an acceptable level. Maybe this needs to be configurable, I don't know. Anyway, thanks for your contribution. This would require review and some testing to ensure it does not break anything for existing systems. |
|
I'm aware of the option to separate per project. But the problem is: We are migrating from an old version, where the storage was on DB. We have 200+ projects and it's a manual work to update every project to use it's own folder. We have 100000+ attachments, but some projects with none and other with 30000+, so still not evenly distributed. Using 2 letters (246 folders on the first level), the files were better distributed, with less than 400 per folder. But will be interesting to have some tool to manipulate and reorganize/rearrange this structure if necessary.
|
|
Actually not such a big effort, something along these lines should be enough to do the trick
I agree this might be useful functionality.
This might be worth implementing as default behavior actually. |
|