I have written a shell script which has to do the following:
- Capture session commands into one file.
- Every individual command into separate file.
- Mail every individual command file content based on certain criteria.
As for my observation, the loop has to iterate minimum of 25,000 times. Now my problem is that it is taking more than 6 hours to complete all iterations.
Below is the main part of the script which is taking long time to process.
if [ -s "$LOC/check.txt" ]; then
while read line; do
echo -e " started processing $line at `date` " >> "$SCRIPT_LOC/running_status.txt"
TST=`grep -w $line $PERM_LOC/id_processing.txt`
USER=`echo $TST | grep -w $line | awk -F '"' '{print $10}'`
HOST=`echo $TST | grep -w $line | awk -F '"' '{print $18}'`
ID=`echo $TST | echo $line | tr -d '\"'`
IP=`echo $TST | grep -w $line | awk -F '"' '{print $20}'`
DB=`echo $TST | grep -w $line | awk -F '"' '{print $22}'`
CONN_TSMP=`echo $TST | grep -w $line | awk -F '"' '{print $2}'`
if [ -z "$IP" ]; then
IP=`echo "$HOST"`
fi
if [ "$USER" == "root" ] && [ -z $DB ]; then
TARGET=/data1/sessions/root_sec
CMD_TARGET=/data1/commands/root_commands
FILE=`echo "$ID-$CONN_TSMP-$USER@$IP.txt"`
else
TARGET=/data1/sessions/user_sec
CMD_TARGET=/data1/commands/user_commands
FILE=`echo "$ID-$CONN_TSMP-$USER@$IP.txt"`
fi
ls $TARGET/$FILE
If [ $? -ne 0 ]; then
echo $TST | awk -F 'STATUS="0"' '{print $2}'| sed "s/[</>]//g" >> "$TARGET/$FILE"
echo -e "\n" >> "$TARGET/$FILE"
fi
grep $line $LOC/out.txt > "$LOC/temp.txt"
while read val; do
TSMP=`echo "$val" | awk -F '"' '{print $2}'`
QUERY=`echo "$val" | awk -F 'SQLTEXT=' '{print $2}' | sed "s/[/]//g"`
echo " TIMESTAMP=$TSMP " >> "$TARGET/$FILE"
echo " QUERY=$QUERY " >> "$TARGET/$FILE"
RES=`echo "$QUERY" | awk {'print $1'} | sed 's/["]//g' `
TEXT=`grep "$RES" "$PERM_LOC/commands.txt"`
if [ -n "$TEXT" ]; then
NUM=`expr $NUM + 1`
SUB_FILE=`echo "$ID-$command-$NUM-$TSMP-$USER@$IP.txt"`
echo -e "===============\n" > "$CMD_TARGET/$SUB_FILE"
echo "FILE = \"$SUB_FILE\"" >> "$CMD_TARGET/$SUB_FILE"
### same way append 6 more lines to $SUB_FILE
SUB=`echo "$WARN_ME" | grep "$command"`
if [ "$command" == "$VC" ]; then
STATE=`echo " very critical "`
elif [ -z "$SUB" ]; then
STATE=CRITICAL
else
STATE=WARNING
fi
if [ "$USER" != "root" -a "$command" != "$VC" ]; then
mail command &
elif [ "$USER" == "root" -a -z "$HOST" ]; then
mail command &
elif [ "$USER" == "root" -a "$command" == "$VC" ]; then
mail command &
else
echo -e "some message \n" >> $LOC/operations.txt
fi
fi
done < "$LOC/temp.txt"
done < "$LOC/check.txt"
fi
Can any one help me how to optimize this code either by dividing or by changing logic or by using functions or by anything else?
Here I have to use a shell script only and the server on which the script will be executed should not take more than 3GB of RAM to process it.
Any help is very very useful.
Best Answer
Oh my!
I can see why it takes forever to run, you're repeating operations, not caching information and pretty much beating the computer to death. Poor Computer. :(
Awk is not a light-weight, and you're invoking it many, many times over the same data. I was able to run it once and set all five variables.
Without knowing what this is supposed to be doing or accomplishing, there's just so much that can be done.
Considering that ALL of the processing is grep's, awk's, sed's and tr's, you could get an impressive speed boost by writing this script in PERL. PERL is/was designed to handle text and reports. It can do all those grep/awk/sed/tr internally without shelling out to another program repeatedly.
But here's some improvements:
Hmm, "shell script only". Well, with that in mind, perhaps you could pre-grep "$LOC/check.txt" and/or "$LOC/temp.txt" so that you could use the 'already grepped' output instead of grepping in the loop.
The more I look at it, the more convinced I am that awk could likely do all this work in a single pass through the data... AND process EVERY entry, not just the first one (as I pointed out in the comments, you really need another loop between the "read line" and "read var" loops.)
It'd be a long awk script, but definitely doable. And awk is worth knowing, take a moment and play with it, it's not that difficult, just different. Grok Awk!