Skip to content

Reduce memory footprint of trash expiry jobs#36602

Merged
micbar merged 3 commits intomasterfrom
issue_e2925_2
Jan 28, 2020
Merged

Reduce memory footprint of trash expiry jobs#36602
micbar merged 3 commits intomasterfrom
issue_e2925_2

Conversation

@mrow4a
Copy link
Copy Markdown
Contributor

@mrow4a mrow4a commented Dec 17, 2019

Related issue: https://github.com/owncloud/enterprise/issues/2925

Main problem: PHP PDO driver leaks memory - https://www.dragonbe.com/2015/07/speeding-up-database-calls-with-pdo-and.html when used in long running jobs (php pdo works well in online request where scrips live short). We cannot have scrollable PDO cursor (and it is not good idea either sometimes), and thus whole resultset is returned to memory causing memory allocation peak, when calling execute. This causes memory to cumulate and memory allocated to script is not released in long running jobs like trash expiry.

We also steadily leak memory in below when run 1000 times for 1000 users.

\OC_Util::tearDownFS();
\OC_Util::setupFS($user);

Solution:

  • reduce memory footprint where possible for single user

  • In order not to keep memory when it is not used after resulting large result set, PHP script has to terminate. Thus, do not process all users at once within 1 PHP script, we need to chunk expiry for users (chunk according to max number of files processed for user?)

  • do not fetch trashbin locations for background jobs (as it is not used)

  • split trashbin manager (online queries mostly) and trash expiry manager (background jobs only) to optimize for each case

  • do not keep memory for results of functions that potentially can return large results sets and keep them for duration of function, causing memory to grow further

  • callForUser with limit and offset to allow page users

  • add user/account unit tests

  • do not allow cron to process all the users at once (run many php crons instead, chunk users)

  • add trashbin expiry unit tests

@DeepDiver1975 @jvillafanez @PVince81

@mrow4a mrow4a self-assigned this Dec 17, 2019
@mrow4a
Copy link
Copy Markdown
Contributor Author

mrow4a commented Dec 18, 2019

@jvillafanez @DeepDiver1975 @pako81 will 500 users per session be good? The same we have for FileScan - https://github.com/owncloud/core/blob/master/apps/files/lib/BackgroundJob/ScanFiles.php#L98

@mrow4a mrow4a marked this pull request as ready for review December 22, 2019 18:46
@mrow4a mrow4a force-pushed the issue_e2925_2 branch 4 times, most recently from 492cae5 to 7d006b9 Compare December 28, 2019 19:59
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 28, 2019

Codecov Report

Merging #36602 into master will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #36602   +/-   ##
=========================================
  Coverage     64.69%   64.69%           
- Complexity    19114    19127   +13     
=========================================
  Files          1269     1270    +1     
  Lines         74808    74843   +35     
  Branches       1327     1327           
=========================================
+ Hits          48395    48423   +28     
- Misses        26022    26029    +7     
  Partials        391      391           
Flag Coverage Δ Complexity Δ
#javascript 54.10% <ø> (ø) 0.00 <ø> (ø) ⬆️
#phpunit 65.87% <87.28%> (+<0.01%) 19127.00 <44.00> (+13.00) ⬇️
Impacted Files Coverage Δ Complexity Δ
apps/files_trashbin/lib/AppInfo/Application.php 36.84% <0.00%> (-31.58%) 2.00% <0.00%> (ø%)
apps/files_trashbin/lib/TrashExpiryManager.php 100.00% <0.00%> (ø) 13.00% <0.00%> (?%)
lib/private/User/AccountMapper.php 83.20% <0.00%> (+2.85%) 36.00% <0.00%> (+5.00%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1bd1099...c779815. Read the comment docs.

@mrow4a mrow4a force-pushed the issue_e2925_2 branch 4 times, most recently from 454f6d1 to 5fe6767 Compare December 30, 2019 12:50
@mrow4a
Copy link
Copy Markdown
Contributor Author

mrow4a commented Dec 30, 2019

@DeepDiver1975 @jvillafanez MR is ready.

}

if ($limit !== null || $offset !== null) {
$qb->orderBy('user_id'); // needed for predictable limit & offset
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to switch to the (account) "id" instead of the "user_id". If a user is added or removed between runs, this could lead to a user being checked twice or more while another user being neglected for a long time.
Due to the id being an autoincrement column, it's expected that new users will always be added in the last position, not in between.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, we can change the "offset" to "initialId".
Instead of SELECT * FROM table WHERE ..... LIMIT limit OFFSET offset we can use SELECT * FROM table WHERE id > initialID ...... LIMIT limit and store the last id for the next run. This way, if a user is removed we won't rescan a previously-scanned user. In addition, we can take advantage of DB indexes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to order by user_id, at least in the scope of this PR. Please have a look in another places in the class, and there we order by user_id (for some reason it was done, and I would not change it before some consultation). For the second comment, I commented in #36602 (comment).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as the sorting column is indexed, I don't think it's relevant if you sort by id or user_id. My point is that the "offset" clause is bad and we should get rid of it (https://www.eversql.com/faster-pagination-in-mysql-why-order-by-with-limit-and-offset-is-slow/)

Taking into account that this is a new function, it could be a good starting point. Maybe the change is bigger than I think, so if you think the change is too big for the PR, we can go with the current code for now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jvillafanez I am not sure if this is a good idea in this case to expose rowid/accountids for limit, as we use IUserManager->callForUsers public interface - limit and offset are already used in this class in 3 other functions. If this was some internal implementation to page data I would be more then opting for this solution.

I know it is super tempting in this case, but it might happen that other use-cases would be interested in non-arbitrary pages (e.g. you have UI with 100 users per page, and 1000 pages, and I want to calculate something for a page no. 4, how do you do that with the offset being some magic number?).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to discuss this offline. Anyway, we can keep the limit + offset approach in this case.

$this->config->setAppValue('files_trashbin', 'cronjob_trash_expiry_offset', 0);
} else {
$offset += self::USERS_PER_SESSION;
$this->config->setAppValue('files_trashbin', 'cronjob_trash_expiry_offset', $offset);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said (below?), it might be better to store the last account id checked so we can start over from there the next run, in order to have better consistency if users are added or removed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is a good idea. In that class we already use offset and limit, also FileScan uses similar mechanism - https://github.com/owncloud/core/blob/master/apps/files/lib/BackgroundJob/ScanFiles.php#L98. I would not over-engineer it here if this is not needed.

Copy link
Copy Markdown
Contributor Author

@mrow4a mrow4a Jan 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this part of the code to always remember last scanned user. If users are added, no expiration would be needed to be done. If user is removed, there wont be anything to expire probably.

return $this->expiration->isEnabled();
}

public function expiryByRetentionEnabled() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might need some docs. This method name is confusing, and I'm not sure when I should use this method or the expiryEnabled one

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if ($this->expiration->isExpired($trashbinEntry->getMtime())) {
Trashbin::delete($trashbinEntry->getName(), $uid, $trashbinEntry->getMtime());
$this->logger->info(
'Remove "' . $trashbinEntry->getName() . '" from "' . $uid . '" trashbin because it exceeds max retention obligation term.',
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String interpolation is preferred over concatenation (https://blog.blackfire.io/php-7-performance-improvements-encapsed-strings-optimization.html). It also might be easier to read.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but not in this line :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored in a class

);

// unset the deleted file as it was already processed
unset($remainingUserTrashbinContent[$key]);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirm this works as expected. Modifying the list you're looping causes problems in most of the languages.
I think it's works because you're looping over a copy of the array, not over the array itself, so modifying the array won't affect the loop.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored it so it is more clear that this is a associative array (unset just removed the index key from array, that still later can be iterated with foreach)

before: [ 0 => a, 1=>b, 2=>c]
after: [ 0 => a, 2=>c]

/**
* @return Quota
*/
protected static function getQuota() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this method is useful. Either move the code to where it's being used or clarify in the phpdocs that the method will always return a new Quota instance.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is being used just a function above https://github.com/owncloud/core/blob/v10.3.2/apps/files_trashbin/lib/Trashbin.php#L778. Lets keep it in the class that it is used at.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function is being used, but what's the point of a function that just creates a new object? Instead of doing self::getQuota() you can do new Quota(.....).
Right now, I don't see any benefit: the function is only used in one place (it isn't being reused) and you can't mock it in the unittests. Note that this might be confusing in the long run because 95% of the "get" function will return always the same instance, they won't create a new instance each time they're called.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Contributor

@micbar micbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add changelog item

@mrow4a
Copy link
Copy Markdown
Contributor Author

mrow4a commented Jan 26, 2020

@jvillafanez @micbar @phil-davis while reviewing, please note that also refactor of trashbin manager was motivated to be able to test it with unit tests properly and cover whole code there. I also did test expiration by background jobs (both space and retention) manually.

Copy link
Copy Markdown
Member

@jvillafanez jvillafanez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a little detail, but I can live with it.

/**
* Tear down owner fs
*/
protected function tearDownFS() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't really needed. Just move the \OC_Util::tearDownFS() call inside the run method

}

if ($limit !== null || $offset !== null) {
$qb->orderBy('user_id'); // needed for predictable limit & offset
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to discuss this offline. Anyway, we can keep the limit + offset approach in this case.


// increase the offset to know where to start next
// e.g. in case of crash of this job due to memory
$this->config->setAppValue('files_trashbin', 'cronjob_trash_expiry_offset', $offset + $usersScanned);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saying, you can sell this better as a transactional mechanism to deal with a memory crash.

@micbar micbar self-requested a review January 28, 2020 09:22
@micbar micbar merged commit daf13e2 into master Jan 28, 2020
@delete-merged-branch delete-merged-branch bot deleted the issue_e2925_2 branch January 28, 2020 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants