Sql-server – Adding Python package to Sql Server 2017 machine learning services breaks it

installationpythonsql serversql-server-2017

I am trying to get started running python inside of Sql Server 2017 with Python Machine Learning Services.

Out of the box it works and I can run simple scripts like listing the installed packages:

EXEC sp_execute_external_script 
@language = N'Python',
@script = N'import pkg_resources
print([p.project_name for p in pkg_resources.working_set])'

When I want to add a new package I follow the Microsoft documentation for 2017: https://docs.microsoft.com/en-us/sql/machine-learning/package-management/install-python-packages-standard-tools?view=sql-server-2017
which says to basically just pip install the new package. So I go to the sql server python install:

C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES

and run:

scripts\pip.exe install pyarrow

This installs an upgraded numpy version as a dependency and when I then try to call even simple python scripts like above I get the following error:

Msg 39012, Level 16, State 1, Line 0
Unable to communicate with the runtime for 'Python' script. Please check the requirements of 'Python' runtime.

Nothing will run while numpy is in this newer version. I can fix the issue by repairing the Sql Server install from the installation media, but that gets me back to my initial setup without the required pyarrow package.

After trying to troubleshoot (incl. installing specific versions of pandas and numpy) I now can't get back to a working state with repairs – will try uninstall and reinstalling the feature.

I can see that the procedure has changed for installing python packages in Sql Server 2019 (https://docs.microsoft.com/en-us/sql/machine-learning/package-management/install-additional-python-packages-on-sql-server?view=sql-server-ver15) so there might be some flaws in 2017 but I can't find it documented anywhere.

I run the latest Sql Server 2017 (cu21) developer edition.

Some suspects:

  • I'm not too familiar but it says in multiple places that Sql Server 2017 Machine Learning services comes with Conda and I'm not sure if Conda and pip are compatible?
  • The error suggests a dependency from microsofts revoscalepy on pandas and again from pandas on numpy.

Current full error:

Msg 39012, Level 16, State 1, Line 0
Unable to communicate with the runtime for 'Python' script. Please check the requirements of 'Python' runtime.
STDERR message(s) from external script: 
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\revoscalepy\__init__.py", line 99, in <module>
    from .RxSerializable import RxMissingValues
  File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\revoscalepy\RxSerializable.py", line 10, in <module>
    from pandas import DataFrame
  File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\__init__.py", line 39, in <module>
    from pandas.core.api import *
  File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\core\api.py", line 10, in <module>
    from pandas.core.groupby import Grouper

STDERR message(s) from external script: 
  File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\core\groupby\__init__.py", line 1, in <module>
    from pandas.core.groupby.groupby import GroupBy  # noqa: F401
  File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\core\groupby\groupby.py", line 25, in <module>
    from pandas.util._validators import validate_kwargs
  File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\util\_validators.py", line 7, in <module>
    from pandas.core.dtypes.common import is_bool
  File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\core\dtypes\common.py", line 10, in <module>
    from pandas.core.dtypes.dtypes import (

STDERR message(s) from external script: 
  File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\core\dtypes\dtypes.py", line 16, in <module>
    from .inference import is_list_like
  File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\pandas\core\dtypes\inference.py", line 9, in <module>
    from pandas.compat import (
ImportError: cannot import name 'Set'


Completion time: 2020-08-25T09:43:23.9780283+02:00

Best Answer

I had the same type of issue with the pandas module on SQL Server 2017 with the same ImportError message that you received. The issue surfaced after installing CU22 for SQL Server 2017 recently and was not present when I set up the environment in question about two months earlier.

In my setup of the environment I performed an upgrade of the numpy and pandas modules to the latest available versions for Python 3.5.2 (which is the default Python version included with SQL Server 2017):

  • numpy:
    • Pre-installed version: 1.12.1
    • Updated version: 1.18.5
  • pandas:
    • Pre-installed version: 0.19.2
    • Updated version: 0.24.2

The reason for the upgrades were requirements from other modules (on numpy) and compatiblity issues between pandas and the new numpy version.

After receiving the ImportError error message when doing import pandas, I pinpointed it to line 9-10 in the file:

C:\Program Files\Microsoft SQL Server\MSSQL14.<instance_name>\PYTHON_SERVICES\Lib\site-packages\pandas\core\dtypes

from pandas.compat import (
    PY2, Set, re_type, string_and_binary_types, string_types, text_type)

As I could not find any reason nor any reference from Microsoft on why this should stop working, I did some research and found this post from the pandas Github project:

Missing required dependencies ['numpy'] on pandas 0.24.1 #25316

It seems that there can be issues with old, installed versions of the numpy and pandas modules, even though they are supposed to be removed by pip.exe during the module upgrade/installation process.

Fair enough, I ran repeated uninstalls of the pandas and numpy modules until pip.exe no longer reported any installed versions and the related directories were gone from Lib\site-packages

Set-Location 'C:\Program Files\Microsoft SQL Server\MSSQL14.instance_name\PYTHON_SERVICES\Scripts'
& '.\pip.exe' uninstall pandas # repeated 2 times
& '.\pip.exe' uninstall numpy # repeated 2 times

In both cases, both the updated and some remaining parts of the original versions of the modules were removed.

I then reinstalled both modules and had no issues after that:

& '.\pip.exe' install --upgrade numpy
& '.\pip.exe' install --upgrade pandas