Sql-server – Prime numbers in a given range

sql serversql-server-2008-r2t-sql

Recently, I was given a task to print all the prime numbers (1-100). I failed drastically there. My Code:

Create Procedure PrintPrimeNumbers
@startnum int,
@endnum int
AS 
BEGIN
Declare @a INT;
Declare @i INT = 1
(
Select a = @startnum / 2;
WHILE @i<@a
BEGIN
@startnum%(@a-@i)
i=i+1;
)
END

Though I ended up without completing it, I wonder is it feasible to do such programs on Database (SQL Server 2008 R2).

If yes, how it may end.

Best Answer

By far the quickest and easiest way to print "all the prime numbers (1-100)" is to fully embrace the fact that prime numbers are a known, finite, and unchanging set of values ("known" and "finite" within a particular range, of course). At this small of a scale, why waste CPU each time to calculate a bunch of values that have been known for a very long time, and take up hardly any memory to store?

SELECT tmp.[Prime]
FROM   (VALUES (2), (3), (5), (7), (11), (13),
        (17), (19), (23), (29), (31), (37), (41),
        (43), (47), (53), (59), (61), (67), (71),
        (73), (79), (83), (89), (97)) tmp(Prime)

Of course, if you do need to calculate the prime numbers between 1 and 100, the following is fairly efficient:

;WITH base AS
(
    SELECT tmp.dummy, ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS [num]
    FROM   (VALUES (0), (0), (0), (0), (0), (0), (0)) tmp(dummy)
), nums AS
(
    SELECT  (ROW_NUMBER() OVER (ORDER BY (SELECT 1)) * 2) + 1 AS [num]
    FROM        base b1
    CROSS JOIN  base b2
), divs AS
(
    SELECT  [num]
    FROM        base b3
    WHERE   b3.[num] > 4
    AND     b3.[num] % 2 <> 0
    AND     b3.[num] % 3 <> 0
)
SELECT  given.[num] AS [Prime]
FROM        (VALUES (2), (3)) given(num)
UNION ALL
SELECT  n.[num] AS [Prime]
FROM        nums n
WHERE   n.[num] % 3 <> 0
AND     NOT EXISTS (SELECT *
                    FROM divs d
                    WHERE d.[num] <> n.[num]
                    AND n.[num] % d.[num] = 0
                    );

This query only tests odd numbers as even numbers won't be prime anyway. It is also specific to the range of 1 - 100.

Now, if you need a dynamic range (similar to what is shown in the example code in the question), then the following is an adaptation of the query above that is still rather efficient (it calculated the range of 1 - 100,000 -- 9592 entries -- in just under 1 second):

DECLARE  @RangeStart INT = 1,
         @RangeEnd INT = 100000;
DECLARE  @HowMany INT = CEILING((@RangeEnd - @RangeStart + 1) / 2.0);

;WITH frst AS
(
    SELECT  tmp.thing1
    FROM        (VALUES (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) tmp(thing1)
), scnd AS
(
    SELECT  0 AS [thing2]
    FROM        frst t1
    CROSS JOIN frst t2
    CROSS JOIN frst t3
), base AS
(
    SELECT  TOP( CONVERT( INT, CEILING(SQRT(@RangeEnd)) ) )
            ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS [num]
    FROM        scnd s1
    CROSS JOIN  scnd s2
), nums AS
(
    SELECT  TOP (@HowMany)
            (ROW_NUMBER() OVER (ORDER BY (SELECT 1)) * 2) + 
                (@RangeStart - 1 - (@RangeStart%2)) AS [num]
    FROM        base b1
    CROSS JOIN  base b2
), divs AS
(
    SELECT  [num]
    FROM        base b3
    WHERE   b3.[num] > 4
    AND     b3.[num] % 2 <> 0
    AND     b3.[num] % 3 <> 0
)
SELECT  given.[num] AS [Prime]
FROM        (VALUES (2), (3)) given(num)
WHERE   given.[num] >= @RangeStart
UNION ALL
SELECT  n.[num] AS [Prime]
FROM        nums n
WHERE   n.[num] BETWEEN 5 AND @RangeEnd
AND     n.[num] % 3 <> 0
AND     NOT EXISTS (SELECT *
                    FROM divs d
                    WHERE d.[num] <> n.[num]
                    AND n.[num] % d.[num] = 0
                    );

My testing (using SET STATISTICS TIME, IO ON; ) shows that this query performs better than the other two answers given (so far):

RANGE: 1 - 100

Query      Logical Reads       CPU Milliseconds    Elapsed Milliseconds
-------    ----------------    ----------------    -----------------

Solomon      0                 0                   0
Dan        396                 0                   0
Martin     394                 0                   1

RANGE: 1 - 10,000

Query      Logical Reads       CPU Milliseconds    Elapsed Milliseconds
-------    ----------------    ----------------    -----------------

Solomon        0                   47                170
Dan        77015                 2547               2559
Martin       n/a

RANGE: 1 - 100,000

Query      Logical Reads       CPU Milliseconds    Elapsed Milliseconds
-------    ----------------    ----------------    -----------------

Solomon            0                 984                996
Dan        3,365,469             195,766            196,650
Martin           n/a

RANGE: 99,900 - 100,000

NOTE: In order to run this test I had to fix a bug in Dan's code -- @startnum was not factored into the query so it always started at 1. I replaced the Dividend.num <= @endnum line with Dividend.num BETWEEN @startnum AND @endnum.

Query      Logical Reads       CPU Milliseconds    Elapsed Milliseconds
-------    ----------------    ----------------    -----------------

Solomon       0                   0                   1
Dan           0                 157                 158
Martin      n/a

RANGE: 1 - 100,000 (partial re-test)

After fixing Dan's query for the 99,900 - 100,000 test, I noticed that there were no more logical reads listed. So I retested this range with that fix still applied and found that the logical reads were again gone and the times were slightly better (and yes, the same number of rows were returned).

Query      Logical Reads       CPU Milliseconds    Elapsed Milliseconds
-------    ----------------    ----------------    -----------------

Dan                0             179,594            180,096