Use the sed delete command (here assuming GNU sed
as found on Ubuntu and other GNU systems).
sed -i '/^[@#]/ d' sample.txt
If you need to account for leading space characters:
sed -i '/^\s*[@#]/ d' sample.txt
Since your file appears to consist of a sequence of records separated by one or more blank lines, I'd suggest trying something based on the paragraph modes of either awk
or perl
.
For example, if you always need to strip off the first two lines, like
1
00:00:00.096 --> 00:00:05.047
you could split into newline-delimited fields within blank-separated paragraphs and skip the first two fields using either
awk -vRS= -vORS= -F'\n' '{for(j=3;j<=NF;j++) print $j; print " "}' file.vtt
or
perl -F'\n' -00ne 'print join("", @F[2..$#F]), " "' file.vtt
If you can't rely on there being a fixed number of fields (lines) to be removed, then it's fairly easy to add a regular expression test - a little easier in perl
since it allows us to grep
directly on arrays rather than writing an explicit loop. For example, to split into blank-separated records and then print only those fields (lines) having at least one sequence of at least 3 alphabetic characters, you could use
perl -F'\n' -00ane '
print join("", grep { /[[:alpha:]]{3}/ } @F), " "
' file.vtt
If you want to exclude the WEBVTT
string you can simply skip the first record, i.e.
perl -F'\n' -00ane '
print join("", grep { /[[:alpha:]]{3}/ } @F), " " if $. > 1
' file.vtt
It will be down to you to choose a suitable regex that capture the wanted lines and excludes the unwanted ones. You can add an END
block in either awk
or perl
if you want to add a final newline to the concatenated output.
NOTE: since (based on the discussion in comments) your files appear to have DOS-style CRLF
line endings, you will need to deal with those - either by modifying the field and record separators in the above commands accordingly, or by stripping out the CR
s first e.g.
sed 's/\r$//' file.vtt |
perl -F'\n' -00ane '
print join("", grep { /[[:alpha:]]{3}/ } @F), " " if $. > 1
'
you're the four functions if you would of management first of all you have the planning the planning stages basically you were choosing appropriate organizational goals and courses action to best achieve those goals steeldriver@xenial-vm:~/test/$
Best Answer
Try this adaption of your
sed
one liner:It matches the range from your first pattern to the first line NOT starting with a space char, and deletes the lines starting with space or an "i" (for the leading
iface
). Need to rethink should thei
be required after the block.Looks like this works:
Pls try and report back.