A reasonable answer to this must IMHO ask (or at least imply) the question:
Why were the tables split to begin with?
(More importantly: Does it apply still?)
The general case:
I do not know the specifics of how the postgres db engine works,
but the lower bound of saved disk space may be no larger than
the size of the columns removed
+ indexes of duplicate variables
+ a bit of housekeeping data
The upper bound is also quite small, so it's probably not going to be worth the trouble;
it may even be that they were partitioned for performance reasons - because some columns would be accessed (read, or even updated) more often, and as such it could be made small (fewer columns) to make it faster. (At some penalty for accessing columns from both at the same time.)
The specific case:
It looks like there's a lot of redundant fields between them. user_id, for example, would not need to be there twice. begin/end/X timestamps could perhaps be reduced to two or one. So there could be some slight saving from that, depending on the application's requirements. Again, try to figure out why they did it to begin with.
Overall, though, I agree with Bryan Agee; disk space should probably not be your concern on anything big enough to run postgres to begin with. Particularly if you get paid for the time you spend on "fixing" this, that cost is likely to exceed the cost of the bigger disk by itself.
Per-row space:
I'm not well versed in postgres particulars, so someone in the know should be able to correct anything wrong. Such as there being way to ask postgres how much space that specific row is actually taking up; I don't know any.
What I'm writing is basically the theory behind database storage as (I believe) commonly done today.
Each field has a bit (in a byte of its own, or in a bitset common for the whole row) which says if the value is a null or not. If it's a null value, there's nothing more stored.
Then there's a length byte - unless this is implicit from fixed-width data types. Then there's the data itself.
Thus, a row of one int (even 64-bit) and three null values may take as little as 3 bytes.
(For values <127, storing size=1 and the byte.)
Plus indexes, and various other housekeeping metadata. Again, I don't know just how far postgres goes in this regard. And these factors all taken together will tend to make "how much does this row occupy" a question with a useless answer.
AFAIK postgres also operates with "pages" - blocks of space that a row may not be stored across the boundaries of. So, larger records may more often end up "just not fitting" in a page and so need to be placed in another/new page.
A DATA_LENGTH
of 22 is just the largest number of bytes it can take to store the largest possible number (Oracle uses 2 digits per byte). A NULL
DATA_PRECISION
implies the maximum of 38.
You can verify this with DBMS_METADATA
:
SQL> create table sotmp ( a int );
Table created.
SQL> select dbms_metadata.get_ddl('TABLE','SOTMP')
2 from dual;
DBMS_METADATA.GET_DDL('TABLE','SOTMP')
--------------------------------------------------------------------------------
CREATE TABLE "PHIL"."SOTMP"
( "A" NUMBER(*,0)
) SEGMENT CREATION DEFER
SQL>
(The "*" in the above output denotes the maximum precision).
BINARY_INTEGER
is not a data type that can be used in table columns. It can only be used as a PL/SQL variable datatype.
Best Answer
It depends on your data (and this assumes that you don't care about the different behaviors of the two data types). You can put together a test with sample data that is similar to what you're actually storing to find out what approach will use less space.
I'll create a table
num_test
with two different columns, one declared as anumber
and one as abinary_float
.If I populate it with 100,000 rows of data in increments of 0.001
then I can see the minimum, maximum, and average size required to store the values
If you run that, you'll see that the number column is averaging 3.89 bytes/ row (with a min of 2 and a max of 4) while every
binary_float
required 4 bytes of storage. So there is a small savings to usingnumber
in this case.But is that the whole story? Nope. If we use different sample data, we get very different results. Let's run the same test but divide everything by 3
Now, if you run the same
vsize
query, you'll see that thenumber
column takes up to 21 bytes of storage and averages 7.69 bytes for the 200,000 rows. Theflt_col
is still using a fixed 4 bytes for every row.Depending on your data, you may find that one or the other approach uses more space. Of course, where there are substantial differences in space utilization, there is the potential for more substantial rounding issues when you move to floating point numbers.
Note as well that the results may be version-dependent. I did my test on an 11.2 database where the documentation indicates that
binary_float
always takes 4 bytes of storage. In 10.2, the documentation indicates thatbinary_float
took 5 bytes due to the inclusion of an extra length indicator. It's not obvious to me why a 32-bit value would ever need a length byte so I'm hard-pressed to understand why an earlier version of Oracle might have required the fifth byte, but the documentation is consistent that it did.