How to remove everything in brackets or a particular word and beyond in Notepad++

notepadregex

I am trying to pare down a huge database in order to use the relevant information for a JSON file. It has some very long lines (~400 characters per line) and a few thousand entries in which I need to omit everything from the ( and beyond, everything from the http and beyond, or everything from MISSING and beyond depending on the line.

Most lines do not contain the ()[] information, but all contain the http information. The http information always follows the () information on the lines that do contain it.

Here is an example, I cut off the length for obvious reasons.

PCSH10160    Attack of the Toy Tanks (3.61+!) [3.69]    http://zeu
PCSH10162    Paradox Soul    http://zeus.dl.playstation.net/cdn
PCSH10146    Hoggy2    http://zeus.dl.playstation.net/cdn/HP2005/
PCSB01394    Mekabolt    http://zeus.dl.playstation.net/cdn/EP0
PCSH10186        Himno    http://zeus.dl.playstation.net/cdn/HP2
PCSG01285    MELLKISS    http://zeus.dl.playstation.net/cdn/JP0
PCSB01365    Habroxia    http://zeus.dl.playstation.net/cdn/EP5
PCSE01423    Color Slayer    http://zeus.dl.playstation.net/cdn
PCSE01396    Habroxia    http://zeus.dl.playstation.net/cdn/UP4
PCSG01127    Sen no Hatou, Tsukisome no Kouki    http://zeus.dl
PCSB01396    Tic-Tac-Letters by POWGI    http://zeus.dl.playsta
PCSH10203        Gravity Duck    http://zeus.dl.playstation.net
PCSH10175        Crossovers by POWGI    http://zeus.dl.playstation
PCSH10169        Mixups by POWGI (3.61+!) [3.69]    http://zeus.dl
PCSH10167        One Word by POWGI    http://zeus.dl.playstation
PCSH10166        Word Search by POWGI    http://zeus.dl.playsta
PCSH10179        Word Wheel by POWGI    http://zeus.dl.playstation
PCSH10180        Wordsweeper by POWGI    http://zeus.dl.playsta
PCSH10168        Word Sudoku by POWGI    http://zeus.dl.playsta
PCSB00625    SENRAN KAGURA: Bon Appétit! Stacked Soundtrack    ht

The end result should be

PCSH10160    Attack of the Toy Tanks
PCSH10162    Paradox Soul
PCSH10146    Hoggy2
PCSB01394    Mekabolt
PCSH10186        Himno
PCSG01285    MELLKISS
PCSB01365    Habroxia
PCSE01423    Color Slayer
PCSE01396    Habroxia
PCSG01127    Sen no Hatou, Tsukisome no Kouki
PCSB01396    Tic-Tac-Letters by POWGI
PCSH10203        Gravity Duck
PCSH10175        Crossovers by POWGI
PCSH10169        Mixups by POWGI
PCSH10167        One Word by POWGI
PCSH10166        Word Search by POWGI
PCSH10179        Word Wheel by POWGI
PCSH10180        Wordsweeper by POWGI
PCSH10168        Word Sudoku by POWGI
PCSB00625    SENRAN KAGURA: Bon Appétit! Stacked Soundtrack

I'm not concerned about the spacing between ID and title as that can be fixed by hand.

Ooooof. I goofed. After running the supplied expression(s) I noticed a smattering of lines that contained the word MISSING followed by various information. Is there a way to have that included in the expression alongside the ( and http?

Or as a separate expression, it just has to respect the case as I'm concerned with the word "missing" being present in a title somewhere and it culling beyond said point.

PCSG00742    Kiss Ato
PCSG00744    One Piece: Burning Blood - Gold Edition
PCSG00747    Zero Escape: Zero Time Dilemma
PCSG00748    Jikkyou Powerful Pro Yakyuu 2016    MISSING    KO5ifR1dQ+d7
PCSG00750    Kai-ri-Sei Million Arthur
PCSG00751    Arcana Famiglia -La Storia Della Arcana Famiglia- Ancora
PCSG00752    Touhou Soujinengi V
PCSG00753    Eikoku Tantei Mysteria: The Crown    MISSING    KO5ifR1dQ+d7
PCSG00756    I am Setsuna

Best Answer

I need to omit everything from ( and beyond, or everything from http and beyond

  • Menu "Search" > "Replace" (or Ctrl + H)

  • Set "Find what" to \(.*?$|http.*?$

  • Leave "Replace with" empty

  • Enable "Regular expression"

  • Click "Replace All"

    enter image description here

Before:

PCSH10160   Attack of the Toy Tanks (3.61+!) [3.69] http://zeu
PCSH10162   Paradox Soul    http://zeus.dl.playstation.net/cdn
PCSH10146   Hoggy2  http://zeus.dl.playstation.net/cdn/HP2005/
PCSB01394   Mekabolt    http://zeus.dl.playstation.net/cdn/EP0
PCSH10186       Himno   http://zeus.dl.playstation.net/cdn/HP2
PCSG01285   MELLKISS    http://zeus.dl.playstation.net/cdn/JP0
PCSB01365   Habroxia    http://zeus.dl.playstation.net/cdn/EP5
PCSE01423   Color Slayer    http://zeus.dl.playstation.net/cdn
PCSE01396   Habroxia    http://zeus.dl.playstation.net/cdn/UP4
PCSG01127   Sen no Hatou, Tsukisome no Kouki    http://zeus.dl
PCSB01396   Tic-Tac-Letters by POWGI    http://zeus.dl.playsta
PCSH10203       Gravity Duck    http://zeus.dl.playstation.net
PCSH10175       Crossovers by POWGI http://zeus.dl.playstation
PCSH10169       Mixups by POWGI (3.61+!) [3.69] http://zeus.dl
PCSH10167       One Word by POWGI   http://zeus.dl.playstation
PCSH10166       Word Search by POWGI    http://zeus.dl.playsta
PCSH10179       Word Wheel by POWGI http://zeus.dl.playstation
PCSH10180       Wordsweeper by POWGI    http://zeus.dl.playsta
PCSH10168       Word Sudoku by POWGI    http://zeus.dl.playsta
PCSB00625   SENRAN KAGURA: Bon Appétit! Stacked Soundtrack  ht

After:

PCSH10160   Attack of the Toy Tanks 
PCSH10162   Paradox Soul    
PCSH10146   Hoggy2  
PCSB01394   Mekabolt    
PCSH10186       Himno   
PCSG01285   MELLKISS    
PCSB01365   Habroxia    
PCSE01423   Color Slayer    
PCSE01396   Habroxia    
PCSG01127   Sen no Hatou, Tsukisome no Kouki    
PCSB01396   Tic-Tac-Letters by POWGI    
PCSH10203       Gravity Duck    
PCSH10175       Crossovers by POWGI 
PCSH10169       Mixups by POWGI 
PCSH10167       One Word by POWGI   
PCSH10166       Word Search by POWGI    
PCSH10179       Word Wheel by POWGI 
PCSH10180       Wordsweeper by POWGI    
PCSH10168       Word Sudoku by POWGI    
PCSB00625   SENRAN KAGURA: Bon Appétit! Stacked Soundtrack  ht

Note:

  • The last example line is not correct but will be when you apply against the untruncated file.
  • To truncate the lines containing MISSING change "Find what" to \(.*?$|http.*?$|MISSING.*?$

Following conversations in the comments the fastest regular expression is

  • \h+(?:\(|http|MISSING).+$

Further reading

Related Question