Google BigQuery – Why No Performance Difference Using WHERE Clause Inside of INNER JOIN?

google-bigqueryperformancequery-performance

Basically I have two different types of query in BQ.
the first one:

select q2.name, q1.* , q2.val1 from table1 as q1
inner join 
(select name,val1, val2 from table2) as q2
on q1.name = q2.name

and second one is:

select q2.name, q1.* , q2.val1 from table1 as q1
inner join 
(select name,val1, val2 from table2  where val1 = "X") as q2
on q1.name = q2.name

so as you can see there, the only difference is filtering X value. When I filter the X value inside of table2, size of table2 is being almost half. So my expectation is seeing a reduction of bytes of data, when I apply the "where" clause. However when I run --dry_run in the BQ cli. I got exactly the same bytes of data.

Query successfully validated. Assuming the tables are not modified, running this query will process 6958332498714 bytes of data.

So can somebody explain why I got exactly the same processed data size?

Best Answer

BigQuery reduces how much data is scanned in a couple scenarios:

For other types of filters, however, such as if you don't partition or cluster a table, you will incur the cost of scanning the entire table. See the query pricing documentation for more information.