SQL Server – Exclude Specific Words in WHERE Clause

sql serverssmst-sql

I have a table that stores invoices and I'm trying to get stats out of it, unfortunately the table is poorly build and some crucial information is all mixed up on a nvarchar field, information such as whether the invoice has been cancelled or if part of the charge is exempt resides on this field in a very daliesque string that gets parsed by the frontend. 3453.234;exempt;Invoice Total...

So I want to create a query that would exclude a few words. My problem is how to accomplish a query that can exclude records depending of a list of keywords (cancelled, exempt), so if any of this words is in the field the amount would not be taken in account.

Best Answer

Its not too difficult to build a query that does it. First create a query that joins on a table using LIKE with wildcards. Then exclude everything from this query.

To see what I mean, see this simple example: http://sqlfiddle.com/#!6/619fb/2

Or alternately I have reproduced an example here:

--Create tables for comparing data
CREATE TABLE Invoice
(
  InvoiceID INT NOT NULL IDENTITY,
  InvoiceData varchar(200) NOT NULL,
);

CREATE TABLE BadWords
(
  BadWordID INT NOT NULL IDENTITY,
  BadWord varchar(10) NOT NULL
);

--Insert data
INSERT INTO Invoice (InvoiceData)
VALUES ('This is some invoice data'), ('it is about'), ('something interesting that'),
('you should look at'), ('because its got invoice information');

INSERT INTO BadWords (BadWord)
VALUES ('this'),('invoice');

--Test query:
SELECT *
FROM Invoice
WHERE InvoiceID NOT IN (
  SELECT InvoiceID
  FROM Invoice i
  INNER JOIN BadWords b ON i.InvoiceData LIKE '%' + b.BadWord + '%'
)

Performance may be an issue with this if you have hundreds of thousands of rows. Without more information though it would be difficult to give you a high performing query (for example using a NOT CONTAINS free text query may give better performance).