Keep Getting I/O Error Reading Sybase Tables

sybase

I keep getting the following error message from my Sybase server:

I/O error detected during read for
BUF pointer = '0xed318720',
MASS pointer = '0xed318720',
(Buf#: '0'),
page ptr = '0xecdbe000',
dbid = '5',             <-- This is our production database
Mass virtpage = '50372477',
Buffer page = '40829',
Mass status = '0x4000100',
Buffer status = '0x1',
size = '2048',
cache (id: 0) = 'default data cache'.

The problem started from last week. I got around the problem by dropping the database, rebuilding it from a backup in a backup-server. This was OK for a week or so. Now the same problem comes right back.

I have just attempted to apply the same workaround (drop-db and rebuild-from-backup). But this time the problem doesn't go away. The error message is slightly changed into this:

An attempt was made to read logical page '41175'
for object '899820055' in database '5' from disk.
Wrong logical page '41303' was brought into cache
'default data cache'.

Seem like the workaround doesn't really fix the underlying problem. I would like to know what the problem is.

I have a feeling that some areas in the file-system may be corrupted. I am trying to use fsck command to fix the problem. That is the exact command that I use:

fsck /dev/rdsk/c0t0d0s1 <-- It is the device of the database file

But I get this error message:

** /dev/rdsk/c0t0d0s1
BAD SUPER BLOCK: MAGIC NUMBER WRONG
USE AN ALTERNATE SUPER-BLOCK TO SUPPLY NEEDED INFORMATION;
eg. fsck [-F ufs] -o b=# [special ...]
where # is the alternate super block. SEE fsck_ufs(1M).

My questions are:
1. Is my database problem related to file-system corrupted?
2. Is fsck theright tool to fix this problem?
3. What does that error message from fsck mean?
4. How to proceed from here?

Please help. Thanks.

The following are the system info and other error messages:

"SELECT @@VERSION" SQL command returns this:

SQL Server/11.0.3.2/P/Sun_svr4/OS 5.4/SWR 7578 Rollup/OPT/Mon Nov  3 22:19:21 PST 1997

"uname -a" Unix shell command returns this:

SunOS <my-server-name> 5.4 Generic_101945-62 sun4m sparc

"df" Unix shell command returns this:

/            (/dev/dsk/c0t3d0s0 ):  412984 blocks   287260 files
/usr         (/dev/dsk/c0t3d0s6 ):  884768 blocks   238161 files
/proc        (/proc             ):       0 blocks     8146 files
/dev/fd      (fd                ):       0 blocks        0 files
/tmp         (swap              ): 2675424 blocks   102850 files
/export      (/dev/dsk/c0t3d0s7 ):  378846 blocks   548286 files
/freespace   (/dev/dsk/c0t0d0s5 ): 4900998 blocks  1227260 files
/sybase      (/dev/dsk/c0t0d0s0 ): 5159274 blocks  1443285 files
/usr/openwin (/dev/dsk/c0t3d0s3 ):  293690 blocks    94533 files

Cannot find "memsize" command or "hw" command (regardless if I use the "root" account or not). Find the installed memory size during boot-up, and that is 512 MB.

"vmstat" Unix shell command returns this:

procs     memory            page            disk          faults      cpu
r b w   swap  free  re  mf pi po fr de sr f0 s0 s3 --   in   sy   cs us sy id
0 0 0 183396  6448   0   1  5  0  0  0  0  0  0  0  0   29  214   49  0  2 98

Cannot find "sar" command (regardless if I use the "root" account or not). I cannot determine the default data cache size.

I look into the log file (/var/log/syslog). But I cannot find any error message.

"dbcc checkstorage" SQL command is not available.

"dbcc checkdb(my-db-name)" SQL command shows this warning/error messages:

.
.
Checking syslogs
The total number of data pages in this table is 1328.
*** NOTICE:  Space used on the log segment is 2.59 Mbytes, 1.21%.
*** NOTICE:  Space free on the log segment is 212.41 Mbytes, 98.79%.
Table has 25527 data rows.
.
.
Checking mchdwglk
The total number of data pages in this table is 55.
Table has 973 data rows.
Msg 695, Level 16, State 1:
Line 1:
An attempt was made to read logical page '41175' for object '899820055' in
database '5' from disk. Wrong logical page '41303' was brought into cache
'default data cache'.
.
.

"dbcc checkalloc(my-db-name)" SQL command shows this warning/error messages:

Checking isprod
Database 'isprod' is not in single user mode - may find spurious allocation
problems due to transactions in progress.
.
.
***************************************************************
TABLE: syslogs          OBJID = 8
INDID=0  FIRST=199767    ROOT=201099     SORT=0
        Data level: 0.  1328 Data pages allocated and 166 Extents allocated.
TOTAL # of extents = 166
***************************************************************
.
.
***************************************************************
TABLE: puldat           OBJID = 360440408
INDID=1  FIRST=2465      ROOT=2394       SORT=1
        Data level: 1.  23 Data pages allocated and 3 Extents allocated.
        Indid     : 1.  1 Index pages allocated and 1 Extents allocated.
INDID=2  FIRST=2370      ROOT=33460      SORT=1
Msg 7940, Level 18, State 1:
Line 1:
The counts in the OAM are incorrect.  This implies that there are entries
missing.  Run tablealloc utility with the FIX option on the table with the
inaccurate OAM counts.
        Indid     : 2.  20 Index pages allocated and 36 Extents allocated.
TOTAL # of extents = 40
***************************************************************
.
.
***************************************************************
TABLE: mchdwglk         OBJID = 899820055
INDID=1  FIRST=5305      ROOT=3785       SORT=1
        Data level: 1.  55 Data pages allocated and 7 Extents allocated.
        Indid     : 1.  1 Index pages allocated and 1 Extents allocated.
INDID=2  FIRST=3778      ROOT=3777       SORT=1
Msg 2529, Level 16, State 7:
Line 1:
Table Corrupt: Attempted to get page 41175, object 899820055;  got page 0,
object 0.
TOTAL # of extents = 8
***************************************************************
.
.
***************************************************************
TABLE: pcno             OBJID = 1151395221
INDID=1  FIRST=441       ROOT=434        SORT=1
        Data level: 1.  1 Data pages allocated and 1 Extents allocated.
        Indid     : 1.  1 Index pages allocated and 1 Extents allocated.
INDID=2  FIRST=314       ROOT=313        SORT=1
        Indid     : 2.  3 Index pages allocated and 2 Extents allocated.
INDID=3  FIRST=450       ROOT=450        SORT=1
Msg 7940, Level 18, State 1:
Line 1:
The counts in the OAM are incorrect.  This implies that there are entries
missing.  Run tablealloc utility with the FIX option on the table with the
inaccurate OAM counts.
        Indid     : 3.  1 Index pages allocated and 1 Extents allocated.
TOTAL # of extents = 5
***************************************************************
.
.
Processed 418 entries in the sysindexes for dbid 5.
Found 1 bad entries in the sysindexes.

Jay Chan

Best Answer

Converting my comment to answer:

Can you run SELECT @@VERSION and post the output. What is max memory and default data cache value ? Run dbcc checkstorage, the dbcc checkdb and dbcc checkalloc commands and post the output as well. This is a typical hardware (disk) problem. Also check OS logs. They will give you correct error. Check An attempt was made to read logical page..Wrong logical page