I am working on a project in which I need to remove all formatting from a text file including whitespaces and line breaks, then replace any colons with pipes. I've made some headway but I cannot find a way to mask out the parts that need to be ignored. I am new to sed and am only at novice level with Bash scripting, and am, in fact, not entirely sure sed is the right tool for the job (maybe vi? I typically use Nano). The file that I am trying to format is similar to this
== LUN mysql05-dbdat02 ==
LUNName: mysql05-dbdat02
CollectionStartTime: 2012-09-20T15:43:03-04:00
CollectionEndTime: 2012-09-20T15:43:34-04:00
Capacity
CurrentCapacity: 512
IOOperations
Reads: 100
Writes: 0
ReadsPerSecond: 0.000000
WritesPerSecond: 0.000000
ReadMBPerSecond: 0.000
WriteMBPerSecond: 0.000
TotalMBPerSecond: 0.000
NonOptimizedIOPerSecond: 0.000000
CacheHitPercentage: 0.000
PerformanceMetrics
TotalIOsPerSecond: 0.000
ReadIOsPerSecond: 0.000
WriteIOsPerSecond: 0.000
TotalMBPerSecond: 0.000
ReadMBPerSecond: 0.000
WriteMBPerSecond: 0.000
Performance
== LUN mysql05-dbdat02 ==
LUNName: mysql05-dbdat02
CollectionStartTime: 2012-09-20T15:43:03-04:00
CollectionEndTime: 2012-09-20T15:43:34-04:00
Capacity
CurrentCapacity: 512
IOOperations
Reads: 100
Writes: 0
ReadsPerSecond: 0.000000
WritesPerSecond: 0.000000
ReadMBPerSecond: 0.000
WriteMBPerSecond: 0.000
TotalMBPerSecond: 0.000
NonOptimizedIOPerSecond: 0.000000
CacheHitPercentage: 0.000
PerformanceMetrics
TotalIOsPerSecond: 0.000
ReadIOsPerSecond: 0.000
WriteIOsPerSecond: 0.000
TotalMBPerSecond: 0.000
ReadMBPerSecond: 0.000
WriteMBPerSecond: 0.000
Performance
and the output needs to be something like this,
cm-data-unity01|LUNNam=cm-data-unity01|CollectionStartTim=2012-09-20T15:43:03-04:00|CollectionEndTim=2012-09-20T15:43:34-04:00|Capacity|CurrentCapacit=2048|IOOperations|Read=10|Write=90|ReadsPerSecon=8.000000|WritesPerSecon=76.000000|ReadMBPerSecon=0.430|WriteMBPerSecon=0.542|TotalMBPerSecon=0.973|NonOptimizedIOPerSecon=85.000000|CacheHitPercentag=0.000|PerformanceMetrics|TotalIOsPerSecon=84.000|ReadIOsPerSecon=8.000|WriteIOsPerSecon=76.000|TotalMBPerSecon=0.973|ReadMBPerSecon=0.430|WriteMBPerSecon=0.542|Performance|
or, all on one line.
I have written a very simple Bash script to format it, like thus
# Author Christopher George Bollinger
# Comments: This script will modify the snippet.txt file.
# This script is meant to, first, take a specific bit of unformatted data and remove all line breaks and non-printable characters.
# Following this, the script is to replace any appropriate colons (those being used as delimiters) and replace them with the equals (=) character.
#!/bin/bash
echo "This script will remove line breaks, remove non-printable characters, and will replace colons used as field delimiters with the equals '(=)' character."
cp snippet.txt snippetwork.txt
RmLB ()
{
tr -d '\n' < snippetwork.txt > snippetwork1.txt
}
RmNonPrint ()
{
tr -cd "[:print:]" < snippetwork1.txt > snippetwork2.txt
}
RplcW ()
{
sed 's/: /=/g' snippetwork2.txt > snippetwork3.txt
}
RmWtSpc ()
{
tr -s ' ' '|' < snippetwork3.txt > snippetgood.txt
sed 'd/(?:[a-z]=) /'
}
QuChek ()
{
cat snippetgood.txt
read -p "Is this satisfactory? (Y/n)" Choice
case $Choice in
Y|y)
mv snippetgood.txt snippet.txt
rm -f snippetwork*
rm -f snippetgood.txt
;;
N|n)
exit
;;
*)
echo "Invalid Input."
;;
esac
}
read -p "Would you like to begin? (Y/n)" YorN
case $YorN in
Y|y)
RmLB
RmNonPrint
RplcW
RmWtSpc
QuChek
;;
N|n)
exit
;;
*)
echo "Invalid Selection"
;;
esac
Which functions except the output is not quite right, it gives:
==|LUN|mysql05-dbdat02|==|LUNName=|mysql05-dbdat02|CollectionStartTime=|2012-09-20T15:43:03-04:00|CollectionEndTime=|2012-09-20T15:43:34-04:00|Capacity|CurrentCapacity=|512|IOOperations|Reads=|100|Writes=|0|ReadsPerSecond=|0.000000|WritesPerSecond=|0.000000|ReadMBPerSecond=|0.000|WriteMBPerSecond=|0.000|TotalMBPerSecond=|0.000|NonOptimizedIOPerSecond=|0.000000|CacheHitPercentage=|0.000|PerformanceMetrics|TotalIOsPerSecond=|0.000|ReadIOsPerSecond=|0.000|WriteIOsPerSecond=|0.000|TotalMBPerSecond=|0.000|ReadMBPerSecond=|0.000|WriteMBPerSecond=|0.000|Performance|==|LUN|mysql05-dbdat02|==|LUNName=|mysql05-dbdat02|CollectionStartTime=|2012-09-20T15:43:03-04:00|CollectionEndTime=|2012-09-20T15:43:34-04:00|Capacity|CurrentCapacity=|512|IOOperations|Reads=|100|Writes=|0|ReadsPerSecond=|0.000000|WritesPerSecond=|0.000000|ReadMBPerSecond=|0.000|WriteMBPerSecond=|0.000|TotalMBPerSecond=|0.000|NonOptimizedIOPerSecond=|0.000000|CacheHitPercentage=|0.000|PerformanceMetrics|TotalIOsPerSecond=|0.000|ReadIOsPerSecond=|0.000|WriteIOsPerSecond=|0.000|TotalMBPerSecond=|0.000|ReadMBPerSecond=|0.000|WriteMBPerSecond=|0.000|Performance|
the problem being the pipes appearing following the equals signs. If anyone could point me in the right direction on getting this right, or even to an online resource for some clarification, I would be immensely grateful.
Funny thing is the end game for this is that, while the immediate request is to format like the above example, to feed this into a Unix cli graphing tool (my guess is gnuplot). From what I understand, gnuplot requires the formatting to be in columns. As mentioned, this is new territory for me and I would greatly appreciate any advice given.
Best Answer
I am not quite sure what you're trying to do. Using your first input file, I create this output:
With this perl one liner:
You could also do it with this: