EDITED THE QUESTION as i am open to different type of solutions unlike just batch previously
I am on Windows & some suggested SED etc. So i am OK with these 3rd party standalone exe's using command line
Say i have following lines in abc.txt file
"@yuy007 what are you doing friend #disneyrocks"
"STFU, i dont care what you think @happy55"
"@social88 @gg99 ok mate see you at the subway :)"
"btw arnold was great in that movie @tt11 @gg11 #disneyrocks"
"we are going to disney. Do you want to? #disneyrocks"
"We dont like disney. #disneyrocks we are not going"
".@socialguy what are you upto #disneyrocks "
I need to employ 5 filters with above file to get def.txt
- Delete all lines which start with @ character, like 1st and 3rd
- Delete all lines which start with .@ characters, like 7th
- Delete all lines which don't have any word starting with # like 2nd and 3rd
- In leftover lines, Delete all words starting with @ character (keeping the lines intact) like words @happy55 in 2nd , @social99 & @gg99 in 3rd, etc. In this case we still need to preserve quotes " at start and end of line
- Delete all the blank lines left after above lines are removed
EDIT
if i have following line , it wrongly deletes the content after @word's
"btw arnold was great in that movie @tt101 @gb1997 #whatthehell"
is edited to
"btw arnold was great in that movie"
Thanks
Best Answer
You are going to want to use regular expressions for this. Because you've specified BATCH as your preferred scripting language we'll need to add that functionality. There are several ways we could accomplish this but I like this version written by someone named Dave Benham at dostips.com because it uses only binaries which should already be on your machine:
Copy that and save it as repl.bat. You may want to place it in your system path if you think you'll use it again. Otherwise just put it with the files you are working on. Now create another file for this task (I called it test.bat):
That should give you what you want. This has been modified to output Windows line endings (my text editor doesn't care so I didn't notice the problem).
The
repl "^[\s\q]@[^\s].*\r?\n?" "" XM
part of this removes every line which begins with a quote or a @. It will ignore lines which just have"@ some text
or@ some text
or just@
or"@
(the @ must be followed by at least one non whitespace character). You may remove this requirement by removing the[^\s]
.The
repl "[\s\q]@[^\s\q]+" "" X
part of this removes every word which begins with a @ and has at least one character which is not whitespace or a quote after it.We use the X parameter because it adds the /q replacement which allows us to search for those pesky quotes. The M option is needed so that we can actually replace new lines (also, without it we'd have an extra blank line at the end). Further information can be found in the JScript RegEx reference.
Note: I've now fixed some issues with the above replacements and made them much simpler using better commands for this.
If you want to show only lines which contain a @ then you can use:
This one took a long time to figure out how to get to work in all situations and I may have missed a few possible combinations. It will ignore email addresses and @ character by themselves in a line though. RegEx isn't great at negating results and requires the use of a look-ahead to do this. The second part of this deals with part of that mess by removing all blank lines left over after the first call. This has the possibly unwanted side-effect of also removing any already blank lines in the file.