Sql-server – Update statement with strange side effects to data

sql serverupdate

The problem:
I am trying to add some logic to a database refresh job that will de-identify PHI-related fields, and have come across a strange side effect. I am attempting to exclude a group of patients, in order to allow that set to see their true information, and mask all of the other sets. When I run my update statement, it also updates a large portion of that set I was trying to exclude. Why is that, and how can I change my update statement to avoid that behavior?

The data model:

  1. Patients table – ID as the key, and GroupID to differentiate groups of patients. Also has an IsDeleted field (bit) to indicate soft deletion.t item
  2. Phone table – ID as the key, and has three business fields: AreaCode, PhoneNumber, and Extension. Also has an IsDeleted field (bit) to indicate soft deletion.
  3. PatientPhone – Has an ID as the key, but also uses the combination (PatientID, PhoneID) to link these tables.

The relationship:
Patient to PatientPhone – 1 to Many
Phone to PatientPhone – 1 to Many

In order to validate before and after, I have been using this query to sample the phone numbers by grouping the first couple of digits. It gives me a wide distribution of values, as I would expect:

SELECT
P.GroupID, left(PH.AreaCode, 2) as ShortAreaCode, left(PH.PhoneNumber, 2) as ShortPhoneNumber,
    left(PH.Extension,2) as ShortExtension, count(*)
FROM dbo.Patient P
JOIN dbo.PatientPhone PP
ON P.ID = PP.PatientID
JOIN dbo.Phone PH
ON PP.PatientID = PH.ID
WHERE P.GroupID = 12345
GROUP BY P.GroupID, left(PH.AreaCode, 2), left(PH.PhoneNumber, 2), left(PH.Extension,2)
ORDER BY count(*) desc, P.GroupID, ShortAreaCode, ShortPhoneNumber, ShortExtension

I then run this update statement against the Phone table to default every other patient's phone number to 555-5555:

   UPDATE
        PH
        SET AreaCode = '555', PhoneNumber = '5555555', Extension = NULL
        FROM dbo.Phone as PH
        INNER JOIN dbo.PatientPhone as PP
        on PH.ID = PP.PhoneID
        INNER JOIN dbo.Patient as P
        on P.ID = PP.PatientID
        WHERE P.GroupID <> 12345 --This is the GroupID where we want to retain phone numbers

However, when I re-run my validation query, I see a couple of problems:

  1. Thousands of records I did NOT want to update were updated.
  2. Many of the other groups of Patients were NOT updated.

(Sample of the output afterwards):
Grouping by count, of two patient groups
In theory, the data model allows for phones to be related to more than one patient. We did have some issues in the past, and most of the phone numbers have been denormalized. I need help to understand:

  • How I can explain this side effect of the update statement
  • How I can adjust my update logic to exclude any 'shared' phone numbers, so that the set of patients not in one particular GroupID will have their phones defaulted

I would tend to believe the data (phones being shared) is likely the cause of this problem, but I am having trouble proving that. Can someone help me with this? Thanks for your time.

Best Answer

In the select, you're using a different table - dbo.phone_Copy. The update is running on dbo.phone.