Image

Imagepat_barron wrote in Imagelinux 🤔curious

How do I get RHEL 4 to flush its disk I/O buffers more frequently?

Here's a little story, and a question that arises from it...

I've spent a bunch of "quality time" this weekend moving a bunch of data between two servers - from an old server that's to be retired, to a shiny new server. The amount of data that was moved is sort of large - about 250 gigabytes.

I finished up moving all the data, edited the network config appropriately to cause the two machines to swap their identities, and was ready to reboot both machines. Now, I've been using Unix-like systems long enough that I still run a "sync" by hand before I shut down a machine (old habits die hard...). So I got on the new machine, and ran sync. And it (apparently) hung. I'm used to "sync" sometimes taking a few seconds, but this was way beyond that. A minute goes by. Two minutes go by. The sync is still hung. I'm terrified that I've hung the machine (and since I'm in Pittsburgh, the machine is in Dallas, there's nobody down there on a Sunday afternoon to kick the thing if it hangs, and I'm on a deadline to get this done... this was especially disconcerting...).

I opened up a new ssh to the new machine, and lo and behold, it let me log in. So I did. Run "ps ax", see the sync process stuck in "D" state (waiting for I/O). So I ran "vmstat -d" to see if anything was going on. And there's *LOTS* going on. Lots and lots of disk I/O. Continuously. At least it was doing something... I'm glad I ran the "sync" by hand - if I'd just rebooted and let the system flush buffers as it shut down, it would have looked hung and I wouldn't have had any way to find out what was going on...

As it turned out, the "sync" took more than five minutes to complete (it may have actually been closer to 10 minutes, since I walked away to get a cup of coffee once I saw that the system was still alive - the sync was done by the time I got back).

As near as I can figure, it looks like the system is holding as much stuff in memory as it can, for as long as it can. And since the system was not otherwise busy while I was doing this, it was able to use most of physical memory as I/O buffers. And this system has 24 gigabytes of memory...

So my question is, is there anyway I can make the system periodically force a buffer flush more frequently? It was sort of disturbing to me that so much data was (apparently) in memory and hadn't been flushed to disk yet; if something had happened to crash the system in the interim and those buffers didn't get flushed, I would have lost a lot of data. And I'm not sure I would have even realized it until much later, unless "fsck" gave a lot of errors on the way back up. Way back in the "dark ages" of Unix, there was a process called "update" that was started out of /etc/rc, that did a sync every 30 seconds. I have half a mind to just write a script to do that and put it into my rc scripts...