SSAS Cube processing randomly fails with 08S01 error

ssas

We have a cube which is scheduled to process every morning at 6:00 to run at SSAS 10.50.2500.0.

Sometimes it does it ok. More often than not, it fails.

When it happens, SQL Server has this entry in the log (XML beautified):

[136] Job OLAP_Refresh reported: 
<return xmlns="urn:schemas-microsoft-com:xml-analysis">
    <results xmlns="http://schemas.microsoft.com/analysisservices/2003/xmla-multipleresults">
        <root xmlns="urn:schemas-microsoft-com:xml-analysis:empty">
            <Exception xmlns="urn:schemas-microsoft-com:xml-analysis:exception" />
            <Messages xmlns="urn:schemas-microsoft-com:xml-analysis:exception">
                <Error ErrorCode="3238395904" Description="OLE DB error: OLE DB or ODBC error: Communication link failure; 08S01; TCP Provider: The specified network name is no longer available.&#xA;; 08S01." Source="Microsoft SQL Server 2008 R2 Analysis Services" HelpFile="" />
                <Error ErrorCode="3240034318" Description="Errors in the OLAP storage engine: An error occurred while processing the '__PARTITION__' partition of the '__MEASURE_GROUP__' measure group for the '__CUBE__' cube from the __OLAP__ database." Source="Microsoft SQL Server 2008 R2 Analysis Services" HelpFile="" />
                <Error ErrorCode="3238002695" Description="Internal error: The operation terminated unsuccessfully." Source="Microsoft SQL Server 2008 R2 Analysis Services" HelpFile="" />
                <Error ErrorCode="3239837698" Description="Server: The operation has been cancelled." Source="Microsoft SQL Server 2008 R2 Analysis Services" HelpFile="" />
            </Messages>
        </root>
    </results>
</return>

Where __PARTITION__ and __MEASURE_GROUP__ are not always the same and once in awhile job finishes without a problem.

msmdsrv.log has entries like this:

($DATE $TIME) Message: OLE DB error: OLE DB or ODBC error: Communication link failure; 08S01; TCP Provider: The specified network name is no longer available.
; 08S01. (Source: \\?\P:\Microsoft SQL Server\MSSQL.2\OLAP\Log\msmdsrv.log, Type: 3, Category: 289, Event ID: 0xC1210003)
($DATE $TIME) Message: OLE DB error: OLE DB or ODBC error: Protocol error in TDS stream; HY000; Communication link failure; 08S01; TCP Provider: An established connection was aborted by the software in your host machine.
; 08S01; Communication link failure; 08S01; TCP Provider: An established connection was aborted by the software in your host machine.
; 08S01; Communication link failure; 08S01; TCP Provider: An established connection was aborted by the software in your host machine.
; 08S01. (Source: \\?\P:\Microsoft SQL Server\MSSQL.2\OLAP\Log\msmdsrv.log, Type: 3, Category: 289, Event ID: 0xC1210003)

Data center swears that there are no network issues.

$TIME is usually around 6:40, though today it failed around 8:40.

ExternalCommandTimeout is set to 50000.

Update doesn't run in parallel, here's what job sends every morning:

<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  <ErrorConfiguration xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200">
    <KeyNotFound>IgnoreError</KeyNotFound>
    <NullKeyNotAllowed>IgnoreError</NullKeyNotAllowed>
  </ErrorConfiguration>
  <Process xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2" xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2" xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100" xmlns:ddl200="http://schemas.microsoft.com/analysisservices/2010/engine/200" xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200">
    <Object>
      <DatabaseID>__OLAP__</DatabaseID>
    </Object>
    <Type>ProcessFull</Type>
    <WriteBackTableCreation>UseExisting</WriteBackTableCreation>
  </Process>
</Batch>'

Executing the job manually works more reliably (I started it three times and it failed once with the same errors), though it might be because I start it around 10:00-14:00

Any idea what it can be and how to fix this?

Best Answer

TCP Provider: The specified network name is no longer available. is pretty clear. You have a TCP connectivity issue.

Perhaps a configuration issue with either the local machine or the server. If SSAS is running on the same machine as SQL Server, you may be seeing resource exhaustion causing TCP to drop packets. Confirm you have the recommended TCP/IP stack settings configured for your particular machine(s).

If you have TCP Chimney enabled, that might be causing an issue.

If the two machines are physically on different networks, you may have a routing issue.