I had a system for dropping aged backups from an partition of backup tarballs. Each host had it's own directory. Within each directory I would define a file (e.g. 00info
) that my pruner would read and run a find against. The problem it encountered was when the backups entering the directory didn't match the patterns in the file. It used bin/find primarily, like
foreach pat in $patterns; do find . -type f -name "$pat" -mtime +7 | xargs rm -f ; done
This was not great, but it was very simple. And I find that if it's simple to maintain, you'll have time to actually maintain it among normal everyday pressures.
If you're programming in python, a bash script isn't going to compare to what you're capable of. So the important thing I'd suggest is: don't feel guilty for having something no-one else uses: you've created a solution that is correct for your requirements, and you can't be more correct than that.
Is there an actual problem your script isn't solving, though? Has it become difficult to maintain the rule-set?
I'll trade you my answer to your question for your answer to mine: What knobs have to be fiddled in /proc or /sys to keep all the inodes in memory?
Now for my answer to your question:
I'm struggling with a similar-ish issue, where I'm trying to get ls -l to work quickly over NFS for a directory with a few thousand files when the server is heavily loaded.
A NetApp performs the task brilliantly; everything else I've tried so far doesn't.
Researching this, I've found a few filesystems that separate metadata from data, but they all have some shortcomings:
- dualfs: Has some patches available for 2.4.19 but not much else.
- lustre: ls -l is a worst-case scenario because all the metadata except the file size is stored on the metadata server.
- QFS for Solaris, StorNext/Xsan: Not known for great metadata performance without a substantial investment.
So that won't help (unless you can revive dualfs).
The best answer in your case is to increase your spindle count as much as possible. The ugliest - but cheapest and most practical - way to do this is to get an enterprise-class JBOD (or two) and fiber channel card off of Ebay that are a few years old. If you look hard, you should be able to keep your costs under $500 or so. The search terms "146gb" and "73gb" will be of great help. You should be able to convince a seller to make a deal on something like this, since they've got a bunch of them sitting around and hardly any interested buyers:
http://cgi.ebay.ca/StorageTek-Fibre-Channel-2TB-14-Bay-HDD-Array-JBOD-NAS-/120654381562?pt=UK_Computing_Networking_SM&hash=item1c178fc1fa#ht_2805wt_1056
Set up a RAID-0 stripe across all the drives. Back up your data religiously, because one or two of the drives will inevitably fail. Use tar for the backup instead of cp or rsync so that the receiving single drive won't have to deal with the millions of inodes.
This is the single cheapest way I've found (at this particular historical moment, anyway) to increase IOPs for filesystems in the 2-4TB range.
Hope that helps - or is at least interesting!
Best Answer
An
inode
the a data structure that contains information about a file. You might be thinking of inode numbers which are indexes into a list of inodes.