Efficient Way to Change One Line in a File

filesfindperformanceshell-script

I want to change the first line of hundreds of files recursively in the most efficient way possible. An example of what I want to do is to change #!/bin/bash to #!/bin/sh, so I came up with this command:

find ./* -type f -exec sed -i '1s/^#!\/bin\/bash/#!\/bin\/sh/' {} \;

But, to my understanding, doing it this way sed has to read the whole file and replace the original. Is there a more efficient way to do this?

Best Answer

Yes, sed -i reads and rewrites the file in full, and since the line length changes, it has to, as it moves the positions of all other lines.

...but in this case, the line length doesn't actually need to change. We can replace the hashbang line with #!/bin/sh␣␣ instead, with two trailing spaces. The OS will remove those when parsing the hashbang line. (Alternatively, use two newlines, or a newline + hash sign, both of which create extra lines the shell will eventually ignore.)

All we need to do is to open the file for writing from the start, without truncating it. The usual redirections > and >> can't do that, but in Bash, the read-write redirection <> seems to work:

echo '#!/bin/sh  ' 1<> foo.sh

or using dd (these should be standard POSIX options):

echo '#!/bin/sh  ' | dd of=foo.sh conv=notrunc

Note that strictly speaking, both of those rewrite the newline at the end of the line too, but it doesn't matter.

Of course, the above overwrites the start of the given file unconditionally. Adding a check that the original file has the correct hashbang is left as an exercise... Regardless, I probably wouldn't do this in production, and obviously, this won't work if you need to change the line to a longer one.

Related Solutions

Shell – Script for changing modification time of files and directories recursively

Use find -exec for recursive touch, with command line args for dirs to process.

#!/bin/sh
for i in "$@"; do
    find "$i" -type f -exec touch -r {} -d '+3 hour' {} \;
done

You can run it like this:

./script.sh /path/to/dir1 /path/to/dir2

Shell – Is piping, shifting, or parameter expansion more efficient

Pretty simple with awk. This will get you the value of every fourth field for input of any length:

$ awk -F' ' '{for( i=1;i<=NF;i+=3) { printf( "%s%s", $i, OFS ) }; printf( "\n" ) }' <<< $list
1 5 6 9 15

This works be leveraging built-in awk variables such as NF (the number of fields in the record), and doing some simple for looping to iterate along the fields to give you the ones you want without needing to know ahead of time how many there will be.

Or, if you do indeed just want those specific fields as specified in your example:

$ awk -F' ' '{ print $1, $4, $7, $10, $13 }' <<< $list
1 5 6 9 15

As for the question about efficiency, the simplest route would be to test this or each of your other methods and use time to show how long it takes; you could also use tools like strace to see how the system calls flow. Usage of time looks like:

$ time ./script.sh

real    0m0.025s
user    0m0.004s
sys     0m0.008s

You can compare that output between varying methods to see which is the most efficient in terms of time; other tools can be used for other efficiency metrics.

Best Answer

Related Solutions

Shell – Script for changing modification time of files and directories recursively

Shell – Is piping, shifting, or parameter expansion more efficient

Related Question