Postgresql – What does ‘fully determines’ mean in the context of postgres functional dependency statistics

postgresql

I am following the example of how to use 'create statistics' in postgresql from this page.

CREATE TABLE tbl (
                     col1 int,
                     col2 int
);

INSERT INTO tbl SELECT i/10000, i/100000
FROM generate_series (1,10000000) s(i);

CREATE STATISTICS s1 (dependencies) on col1, col2 from tbl;
ANALYZE tbl;

SELECT stxname, stxkeys, stxdependencies
FROM pg_statistic_ext
WHERE stxname = 's1';  

returns:

stxname,stxkeys,stxdependencies
s1,1 2,{"1 => 2": 1.000000}

However, I'm confused by this line

Looking at this, we can see that Postgres realizes that col1 fully
determines col2 and therefore has a coefficient of 1 to capture that
information.

In the example there are many records in col2 that are repeated. So how does col1 'fully determine' col2? What does this mean?

Best Answer

It means that if col1 is identical for two rows, then col2 will also be identical.

In other words, if the functional dependency were perfect, you could remove col2 from the table and use a lookup table that maps col1 to col2. But don't forget that these are only statistical data from a sample of the rows, so even if you see a dependency of 1 in the extended statistics, that is no proof that it is really always so (only very often).