SQL Server – Using CLR for Regular Expressions vs External Scripts

external-scriptspythonregular expressionsql serversql-clr

Problem

The main problem we need to use regular expression on MS SQL Server 2019, with the capability
of at least the level on the POSIX Regular expression.

Possible solutions

This Q/A from stackoverflow rightly concludes that if you query must rely on regular expressions you shuould use CLR. This Readgate article elaborates this approach more. So one of my colleagues and I proposed this solution, but my other colleague categorically stated that using CLR here would be a huge risk to security and stability, and using external script (Python or R) is more secure.

This seems to be dubious claim, since the user code in the CLR can be managed, so perhaps the opposite is true, but I was not able to persuade my colleague.

In my other question which I wrote in my desperation because I was forced to use external script and still produce a blazing fast query. SQLpro user states in his comment that:

Using Python or R can be worst in terms of security rather than using CLR!

Which I tend to believe.

Questions

So I have two questions:

  1. Which Regexp solution is more secure external script or CLR based (as described here)? And why?

  2. I also proposed to run the python code on the same Windows Server (must be the same server, because of a policy) but with python intrepeter installed on the OS. Because the results are exported into CSV files either way and stored in the SQL Server. So then I would able to use Python's multiprocessing module to achieve the right performance. The answer was the same that running Python inside SQL Server is more secure than in a outside application. Which is also a questionable claim.

Best Answer

FIRST : the security level of SQL CLR is somtehing like airbag + ABS + Lane Sense for a car. When you use the three devices you are at the top level in terms of security. That is the SAFE level of SQL CLR. When one of this disposal is not used your are in the EXTERNAL_ACCESS and without any of these you are in UNSAFE.

SECOND : there is no such security level in any other languages at all, making believe those other languages are safe which is not true.... If you want to drive a car and all the previous devices are missing, so there is no indication of any security level in this car... is this car safe ?

An an example, SQL CLR processes can be dropped out in a sandbox in case of bad misbehavior. This is not true at all for other languages ! I always do the same trick to proove it : using SQL CLR with an infinite recursion. The process wil be shot and SQL Server service will survive. Using an external Java machine to do the same, will drive your SQL Server to be down because Java will acquire all the memory to the detriment of SQL Server...

Now you have the choice !