Dreamweaver Regex – Why Regex Replace Results in Wrong Character?

dreamweaverfind and replacenotepadregex



Situation

Here is the Regex & Substitution I used for Regex replace in Dreamweaver::

(( and )|( that )|( include )|( includes )|( including ))
$1<br/>\n

image: settings for Regex Replace in Dreamweaver

2.

After I Replace All in current document. I found few errors.

eg::

create new Flux and Mono instances

is being replaced into (wrong one)

create new Flux that <br/>
Mono instances

actual image

instead of (correct one)

create new Flux and <br/>
Mono instances

( regex101 visualization => https://regex101.com/r/hJTqFg/1 )

Question

Why is this happening?

How can I avoid this? (A safer approach for Replacement?)

Have this happened to you before?

note:

  • There are a lot of replacements happened in this document — around 300+ — this document is large.

  • When the replacement process is going. I can see Dreamweaver scroll through the source code quickly.

    Instead of just instantly replace all replacements (which is different from what other text editing software would do).

    Is this normal?

    This makes me suspect that the problem is due to the replacement speed — the lag made Dreamweaver somehow replace the text into the previous replaced one …? — (but idk, idk how internally Dreamweaver is implemented..)



  • (There are only few errors — but still, this should not happen.)

  • (The replacement is specifically for html tag <p> (I dont think this is the problem))

  • (More precisely,

    the original text is create new <code class="literal">Flux</code> and <code class="literal">Mono</code> instances

    instead of just create new Flux and Mono instances

    (for the sake of readability, I simplified it, but it does not matter).)


  • Using Dreamweaver Version 2021

  • Replacement is on a .xhtml file


  • Its not just get replaced into other String (thats in the regex pattern) due to the | in regex pattern.

    There are also mistakes where some characters just get disappeared… (though, this is even more rare)

    eg: blockhound becomes lockhound; 1.0.1.RELEASE becomes .0.1.RELEASE;

    (I didnt just do 1 regex replace pattern, there are also other patterns that I applied to this document;

    though, these 2 eg above, certainly should not be matched in any regex patterns that I used for this document…)


  • I did another regex replace test, by using the option in (Documents in) Folder ..., not Current Document

    — so, the effect of scrolling will not present (& this seems process faster)

    Though, even with this, there are still Errors.

    — so, it seems like the occurence of Error has nothing to do with the scrolling in regex replace in Dreamweaver.



Below is Simplified example of above (if above contains redundant information)

In short:

0 . you have some text

<p>AA and BB</p>

1 . if you use a regex pattern contains an or syntax, |, eg:

(( and )|( that ))

2 . and your replacement contains a capture group $1, eg:

$1foobar

3 . and you perform an regex replace all in Dreamweaver specific to a tag, say: <p> (I call it specific-targeting-replacement tag)

4 .

  • and there is a tag (a tag that is not of type <p>), above <p>, say: <li> (I call it non-specific-targeting-replacement tag)

  • and <li> contains the word that (the that word presents in the regex pattern (with or syntax) that is adjacent to and),

    • (I call that word that as adjacent-replacement word,

    • I call the word and as to-be-replaced word)

<ul>
  <li>xxxx that xxxx</li>
</ul>

5 . then the text in tag <p> below that tag <li>, would be replaced with the wrong String (the adjacent-replacement word). eg:

it should be replaced into (correct)

<p>AA and foobarBB</p>

but it may be replaced into (wrong)

<p>AA that foobarBB</p>

img: file & procedure to create this error

the testing file:

<!DOCTYPE html>
<html>
<body>
<p>AA and BB</p>
<p>AA and BB</p>
<ul>
  <li>xxxx that xxxx</li>
</ul>
<p>AA and BB</p>
<p>AA and BB</p>
</body>
</html>

Best Answer

Here is a solution with Notepad++:

  • Ctrl+H
  • Find what: (?:<p>|\G)(?:(?!</p>).)*?\b(?:and|that|includes?|including)\b\K(?=.*?</p>)
  • Replace with: foobar
  • CHECK Wrap around
  • CHECK Regular expression
  • CHECK . matches newline
  • Replace all

Explanation:

(?:         # non capture group
    <p>         # openning tag
  |           # OR
    \G          # restart from last match position
)           # end group
        # Tempered Greedy Token
(?:         # non capture group
    (?!</p>)    # negative lookahead, make sure we haven't </p> just after
    .           # any character
)*?         # end group, may appear 0 or more times, not greedy
\b          # word boundary
(?:         # non capture   group
    and         # literally
  |           # OR
    that        # literally
  |           # OR
    includes?   # literally include OR includes
  |           # OR
    including   # literally
)           # end group
\b          # word boundary
\K          # reset operator, forget all we have seen until this position
(?=         # positive lookahead, make sure we have after:
    .*?         # 0 or more any character, noot greedy
    </p>        # closing tag
)           # end lookahead

Screenshot (before):

enter image description here

Screenshot (after):

enter image description here

Related Question