I am trying to pare down a huge database in order to use the relevant information for a JSON file. It has some very long lines (~400 characters per line) and a few thousand entries in which I need to omit everything from the (
and beyond, everything from the http
and beyond, or everything from MISSING
and beyond depending on the line.
Most lines do not contain the ()[]
information, but all contain the http
information. The http
information always follows the ()
information on the lines that do contain it.
Here is an example, I cut off the length for obvious reasons.
PCSH10160 Attack of the Toy Tanks (3.61+!) [3.69] http://zeu
PCSH10162 Paradox Soul http://zeus.dl.playstation.net/cdn
PCSH10146 Hoggy2 http://zeus.dl.playstation.net/cdn/HP2005/
PCSB01394 Mekabolt http://zeus.dl.playstation.net/cdn/EP0
PCSH10186 Himno http://zeus.dl.playstation.net/cdn/HP2
PCSG01285 MELLKISS http://zeus.dl.playstation.net/cdn/JP0
PCSB01365 Habroxia http://zeus.dl.playstation.net/cdn/EP5
PCSE01423 Color Slayer http://zeus.dl.playstation.net/cdn
PCSE01396 Habroxia http://zeus.dl.playstation.net/cdn/UP4
PCSG01127 Sen no Hatou, Tsukisome no Kouki http://zeus.dl
PCSB01396 Tic-Tac-Letters by POWGI http://zeus.dl.playsta
PCSH10203 Gravity Duck http://zeus.dl.playstation.net
PCSH10175 Crossovers by POWGI http://zeus.dl.playstation
PCSH10169 Mixups by POWGI (3.61+!) [3.69] http://zeus.dl
PCSH10167 One Word by POWGI http://zeus.dl.playstation
PCSH10166 Word Search by POWGI http://zeus.dl.playsta
PCSH10179 Word Wheel by POWGI http://zeus.dl.playstation
PCSH10180 Wordsweeper by POWGI http://zeus.dl.playsta
PCSH10168 Word Sudoku by POWGI http://zeus.dl.playsta
PCSB00625 SENRAN KAGURA: Bon Appétit! Stacked Soundtrack ht
The end result should be
PCSH10160 Attack of the Toy Tanks
PCSH10162 Paradox Soul
PCSH10146 Hoggy2
PCSB01394 Mekabolt
PCSH10186 Himno
PCSG01285 MELLKISS
PCSB01365 Habroxia
PCSE01423 Color Slayer
PCSE01396 Habroxia
PCSG01127 Sen no Hatou, Tsukisome no Kouki
PCSB01396 Tic-Tac-Letters by POWGI
PCSH10203 Gravity Duck
PCSH10175 Crossovers by POWGI
PCSH10169 Mixups by POWGI
PCSH10167 One Word by POWGI
PCSH10166 Word Search by POWGI
PCSH10179 Word Wheel by POWGI
PCSH10180 Wordsweeper by POWGI
PCSH10168 Word Sudoku by POWGI
PCSB00625 SENRAN KAGURA: Bon Appétit! Stacked Soundtrack
I'm not concerned about the spacing between ID and title as that can be fixed by hand.
Ooooof. I goofed. After running the supplied expression(s) I noticed a smattering of lines that contained the word MISSING
followed by various information. Is there a way to have that included in the expression alongside the (
and http
?
Or as a separate expression, it just has to respect the case as I'm concerned with the word "missing" being present in a title somewhere and it culling beyond said point.
PCSG00742 Kiss Ato
PCSG00744 One Piece: Burning Blood - Gold Edition
PCSG00747 Zero Escape: Zero Time Dilemma
PCSG00748 Jikkyou Powerful Pro Yakyuu 2016 MISSING KO5ifR1dQ+d7
PCSG00750 Kai-ri-Sei Million Arthur
PCSG00751 Arcana Famiglia -La Storia Della Arcana Famiglia- Ancora
PCSG00752 Touhou Soujinengi V
PCSG00753 Eikoku Tantei Mysteria: The Crown MISSING KO5ifR1dQ+d7
PCSG00756 I am Setsuna
Best Answer
I need to omit everything from
(
and beyond, or everything fromhttp
and beyondMenu "Search" > "Replace" (or Ctrl + H)
Set "Find what" to
\(.*?$|http.*?$
Leave "Replace with" empty
Enable "Regular expression"
Click "Replace All"
Before:
After:
Note:
\(.*?$|http.*?$|MISSING.*?$
Following conversations in the comments the fastest regular expression is
\h+(?:\(|http|MISSING).+$
Further reading