Skip to content

Comments

Implement teardown for gh-ost library#479

Merged
shlomi-noach merged 17 commits intogithub:teardown-contribfrom
Gusto:nm-refactor-migration-context
Dec 7, 2017
Merged

Implement teardown for gh-ost library#479
shlomi-noach merged 17 commits intogithub:teardown-contribfrom
Gusto:nm-refactor-migration-context

Conversation

@nikhilmat
Copy link
Contributor

@nikhilmat nikhilmat commented Aug 28, 2017

Fixes #465

This PR adds explicit cleanup to gh-ost for when the library is consumed by another application and might be executed multiple times (for example, by a webserver). This stops any infinite goroutines that have started once the migration has finished, closes any DB connections it has opened, and ensures that the migration context that is passed around is not being globally accessed - instead it instantiates a new one for each migration run and passes it around.

Please let me know how this looks to you, and happy to make any changes to improve it!

It also pulls in the change from outbrain-inc/golib#4.

@shlomi-noach
Copy link
Contributor

Thank you! just as heads up it'll take me a few days to review this, sorry for the delay.

@nikhilmat
Copy link
Contributor Author

@shlomi-noach do you think you will get a chance to review this this week?

@shlomi-noach
Copy link
Contributor

this is the week!

Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks very good -- see finishedMigrating concurrency concern ; this would need to change.
When this is changed, I can send this PR for testing.

Thank you!

knownDBsMutex.Lock()
defer func() {
knownDBsMutex.Unlock()
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simpler:

defer knownDBsMutex.Unlock()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course, thanks!

this.eventsStreamer.Teardown()
}

sqlutils.ResetDBCache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

db *gosql.DB
singletonDB *gosql.DB
migrationContext *base.MigrationContext
finishedMigrating bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is accesses read/write by different goroutines. You'd need o change it to int64 and use atomic.StoreInt64 and atomic.LoadInt64 ; see various such examples throughout the code.


heartbeatTick := time.Tick(time.Duration(this.migrationContext.HeartbeatIntervalMilliseconds) * time.Millisecond)
for range heartbeatTick {
if this.finishedMigrating {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe operation ; concurrency

log.Debugf("Tearing down...")
this.db.Close()
this.singletonDB.Close()
this.finishedMigrating = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe operation ; concurrency ; use atomic.StoreInt64


handledChangelogStates map[string]bool

finishedMigrating bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see concurrency concerns above

this.printStatus(ForcePrintStatusAndHintRule)
statusTick := time.Tick(1 * time.Second)
for range statusTick {
if this.finishedMigrating {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe operation ; concurrency

go func() {
ticker := time.Tick(1 * time.Second)
for range ticker {
if this.finishedMigrating {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe operation ; concurrency

return nil
}
for {
if this.finishedMigrating {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe operation ; concurrency

}

func (this *Migrator) teardown() {
this.finishedMigrating = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe operation ; concurrency

@nikhilmat
Copy link
Contributor Author

@shlomi-noach thanks for the notes on the concurrency concerns - those have been fixed!

i've been doing some additional testing over the past week and came across an issue with the DB cache in sqlutils when running concurrent migrations. since we are now closing our DB connections at the end of each migration, if another migration is running when one ends, the connection gets pulled out from underneath them because the cache has them share a connection pool.

Do you have any objections to attaching the db connections pools that connect to migrationContext.DatabaseName and the information_schema to the migration context, and removing the database cache in sqlutils? that way, we avoid creating too many pools by always referencing it on the context, and it will be scoped to each migration so they don't interfere if one migration finishes and closes its pool.

@nikhilmat nikhilmat force-pushed the nm-refactor-migration-context branch from 92784e7 to 5788ab5 Compare September 22, 2017 22:18
@nikhilmat nikhilmat force-pushed the nm-refactor-migration-context branch from 6822be1 to 84bdfdb Compare September 22, 2017 23:06
@nikhilmat
Copy link
Contributor Author

@shlomi-noach, I implemented that proposed changed here 84bdfdb - let me know what you think of the approach! i believe with these changes, each migration will maintain its own connections before cleaning them up at the end - allowing for migrations to run concurrently.

Thanks again for your review, and sorry for the delay with the above changes! Will be sure to get to any feedback you have shortly :)

@shlomi-noach
Copy link
Contributor

@nikhilmat I'll be in a conference this week so will only take a look next week. But meanwhile, Re: thread pool, please see #491 and #482, work in progress by @zmoazeni to address that.

@zmoazeni
Copy link
Contributor

@shlomi-noach @nikhilmat hey folks. I'm not sure how the thread pool changes affect this PR, but by all means if you need any help just ask.

As far as I understand the codebase and my threadpool PR, I don't explicitly create a new connection. It just reuses the existing ones. So I think they will both go hand-in-hand. But if they don't, just let me know.

@shlomi-noach
Copy link
Contributor

@nikhilmat

Do you have any objections to attaching the db connections pools that connect to migrationContext.DatabaseName and the information_schema to the migration context, and removing the database cache in sqlutils?
I implemented that proposed changed here 84bdfdb - let me know what you think of the approach! i believe with these changes, each migration will maintain its own connections before cleaning them up at the end - allowing for migrations to run concurrently.

Thank you for working on this.
I do not have objections to this, and can see the reasoning.

One thing I'd like not to change is sqlutils.go, which, as you notice, is under vendor. I'd like to keep it as it is (though I confess I've tweaked it here and there in other projects, and it's a technical debt I have to consolidate all changes).

Also, whatever happened to the finishedMigrating flag? The code looks so different now that I feel this is a completely new review.

@nikhilmat
Copy link
Contributor Author

nikhilmat commented Oct 2, 2017

@shlomi-noach great, I believe all the changes on moving the DB connections are covered in this PR - I was able to successfully run concurrent migrations locally.

Totally understand about not wanting to change the source code in vendor/. Are you okay with submitting this change to https://github.com/outbrain/golib and then updating the dependencies here? If not, I can stop using sqlutils.GetDB altogether, and just rely on that method exposed on the migrator. Let me know which you prefer, and I'll make the change. (Relatedly, I had a question about some of the vendor dependencies here: #501)

Sorry for the confusion with the layered changes. The finishedMigrating flag is still present on the applier and migrator. I updated them to use atomic.LoadInt64 and atomic.StoreInt64 in this commit 5788ab5. Besides addressing the concurrency concerns, there were no changes in the last two commits pertaining to finishedMigrating since your previous review (unless I missed something).

Thanks again! Let me know how you'd like to proceed with the sqlutils changes.

@nikhilmat
Copy link
Contributor Author

@shlomi-noach any thoughts on the above with regards to the vendor/ changes for the DB cache? Let me know if there's anything else here that needs revising :)

@nikhilmat
Copy link
Contributor Author

@shlomi-noach friendly ping :) anything else i can do to help this along?

@shlomi-noach
Copy link
Contributor

@nikhilmat sorry for the delay! I was out for a while and then busy catching up. Will get to it.

@nikhilmat
Copy link
Contributor Author

@shlomi-noach no problem, thanks!

@shlomi-noach
Copy link
Contributor

Again, apologies for the delay. Still on it, just have some serious backlog.

@nikhilmat
Copy link
Contributor Author

@shlomi-noach thanks for checking in on this. i know there's still a decision to be made on how to apply the changes to the vendor directory:

Totally understand about not wanting to change the source code in vendor/. Are you okay with submitting this change to https://github.com/outbrain/golib and then updating the dependencies here? If not, I can stop using sqlutils.GetDB altogether, and just rely on that method exposed on the migrator. Let me know which you prefer, and I'll make the change.

Feel free to chime in on your preference on this and I can make the change before taking another pass. Otherwise, I can go ahead remove the usage of sqlutils.GetDB and just rely on the migration context specific db cache.

Thanks!

@shlomi-noach
Copy link
Contributor

Apologies for keeping you waiting on such a small issue.
For now, I'd like you to restore sqlutils.go to its original state.

What I also see you've done is https://github.com/github/gh-ost/pull/479/files#diff-b9f3c8a49a606c2654d3c9c33727e879R253, which just duplicates the code in sqlutils.go? But also undermines your desire to avoid caching the DB object.

Furthermore, MigrationContext isn't the right place for generating the DB instance ; consider it more as a data object that can be serialized.

I suggest the following:

  • Restore sqlutils.go to its original state
  • Remove the GetDB() code from MigrationContext
  • Create GetDB() in go/mysql/utils.go
  • Remove reference to sqlutils.GetDB and refactor as mysql.GetDB

Thank you!

@nikhilmat
Copy link
Contributor Author

@shlomi-noach the goal with moving the caching to the MigrationContext was that the caching would be done a per migration level, and so if migrations were running concurrently, closing a connection in one migration would not affect another. However, happy to follow your suggestion and revert the sqlutils changes and move that method to the mysql utils.

Thanks!

@shlomi-noach
Copy link
Contributor

the goal with moving the caching to the MigrationContext was that the caching would be done a per migration level, and so if migrations were running concurrently, closing a connection in one migration would not affect another

I understand better now, thank you for explaining. To elaborate abit, a futuristic idea is to have the MigrationContext instance serialized and persisted. To that effect, it would not be possible to assume database driver/connections would exist when the instance loads. The specific use case right now is #302.

@nikhilmat
Copy link
Contributor Author

Ah, cool! Thanks for sharing, that's good to know. I had a separate question about adding a custom, migration-specific logger (we're sending our logs to cloudwatch), and this is really helpful. Will open up an issue to discuss.

This is ready for another pass when you get a chance. Incorporated your feedback and updated with the latest master!

@nikhilmat
Copy link
Contributor Author

👋 @shlomi-noach - do you think you'll get a chance to take another look at this soon?

@shlomi-noach
Copy link
Contributor

@nikhilmat apologies yet again for the delays and I appreciate your patience, thank you for pinging.
This looks good to me!

What I'll do is merge this onto a local nm-refactor-migration-context branch, and send to testing.
You will see this PR as "Merged", but it won't be merged to master. I'll @mention you on the merged branch.
This complexity is required because of the decoupling we make between this public repo and our internal infrastructure.

@shlomi-noach shlomi-noach changed the base branch from master to teardown-contrib December 7, 2017 12:01
@shlomi-noach shlomi-noach merged commit e99671e into github:teardown-contrib Dec 7, 2017
@nikhilmat
Copy link
Contributor Author

wonderful - thank you!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants