Bash – Copy a specific percentage of each file in a directory to a new file

bashheadtext processingwc

For example, we have N files (file1, file2, file3 …)

We need first 20% of them, the result directory should be like (file1_20, file2_20, file3_20 …).

I was thinking use wc to get the lines of the file, then times 0.2

Then use head to get 20% and then redirect to a new file, but i don't know how to automate it.

Best Answer

So creating a single example to work from:

root@crunchbang-ibm3:~# echo {0..100} > file1        
root@crunchbang-ibm3:~# cat file1
    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

We can grab the size of the file in bytes with stat:

root@crunchbang-ibm3:~# stat --printf %s "file1"
294

And then using bc we can multipy the size by .2

root@crunchbang-ibm3:~# echo "294*.2" | bc
58.8

However we get a float so lets convert it to an integer for head ( dd might work here too ):

root@crunchbang-ibm3:~# printf %.0f "58.8" 
59

And finally the first twenty percent (give or take a byte) of file1:

root@crunchbang-ibm3:~# head -c "59" "file1" 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Putting it together we could then do something like this

mkdir -p a_new_directory
for f in file*; do
    file_size=$(stat --printf %s "$f")
    percent_size_as_float=$(echo "$file_size*.2" | bc)
    float_to_int=$(printf %.0f "$percent_size_as_float")
    grab_twenty=$(head -c "$float_to_int" "$f")
    new_fn=$(printf "%s_20" "$f") # new name file1_20
    printf "$grab_twenty" > a_new_directory/$new_fn
done

where f is a place holder for any items found in the directory in which the for loop is run that matches file*

which when done:

root@crunchbang-ibm3:~# cat a_new_directory/file1_20
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 

update (to grab 20% of lines):

To grab the first approximate 20% of lines we could replace stat --printf %s "$f" with:

wc -l < "$f"

Since we are using printf and bc we can effectively round up from .5, however if a file is only 1 or 2 lines long it will miss them. So we would want to not only round up, but default to at least grabbing 1 line.

Related Question