Does increased Data Dictionary size affects performance

data dictionaryoracleoracle-11gperformance

As I understand, Data Dictionary

Is a collection of tables, part of SYS schema (if not the whole SYS
schema)
It is stored as 'regular' tables, so whatever is applied to
database tables, also applies to data dictionary (e.g. indexes,
statistics)
Its statistics should be updated
It is always cached

Can the the size and the complexity of oracle data dictionary affect the database performance?

When I add a new object on my database a new (or multiple) entries is/are added on data dictionary. A scenario I can think of has to do with partitioning. So if I have a table which have a partition and subpartitions, I will have an entry for the table, an entry for each partition and an entry of each partition subpartitions in data dictionary. Those entries are spawned among different 'tables'. Imagine know having tables with thousands of partitions and hundreds of subpartition. Again this number of records may not be something that a dbms can handle, but how it will affect its performance as it grows?

Note: I have read a lot of discussion on if and how partitioning affects performance in general. This is out of scope for this question. Partitioning is just an example I gave, to show how data dictionary volume can grow.

Best Answer

So a good starting point for this is Martin Widlake's post on the size of the 'data dictionary'. The term can be a bit woolly as there's a bunch of stuff that Oracle stores (such as executable code, job history etc) which may or may not be relevant.

https://mwidlake.wordpress.com/2009/08/03/why-is-my-system-tablespace-so-big/

He points out an example where he has 13 Gb in a data dictionary segment "C_OBJ#_INTCOL#". That is technically a cluster rather than a regular table, but it is really just a special table where the data is arranged so that related items tend to be stored in the same block on disk. That object is storing histogram information for table columns which tells you that, for a specific table/column, you might have 60% of values as US, 25% as Canada and 15% Mexico, or that you have twice as much data for October 2017 as you did for October 2016.

Whether that's a good use of space will depend on your situation. Histograms are basically used to determine the best access path. For example, given a query that has both a date and zip code paths available, which is going to involve less work.

Ultimately, it isn't about the size of a data dictionary, but a more precise question of whether storing additional information in there can help avoid mistakes. Even then, the downside is often the impact of gathering the additional information and keeping it up-to-date rather than the space required to store it.

Additional : A cluster tends to take up a bit more space than a regular table as trying to keep related data together needs to allow more space for the data to grow rather than moving it around when it doesn't fit.

Related Solutions

Mysql – How database size affects performance: Theory vs reality

It depends entirely on what you are doing with the data.

For basic insert/update/delete transactions that affect just a few rows, then the growth in data size is probably not a big consideration. The database will use in-memory indexes to access the correct page. You get more cache misses when the tables no longer fit into memory. However, the overhead might be slight -- depending on the database, database configurations, and hardware configurations.

If you are doing queries that require full-table scans, then your performance is going to grow linearly or worse with the data size. Indexes can actually make the situation worse, by randomizing page accesses, which then pretty much guarantee cache misses.

An alternative to more memory is improved disk speed -- solid state disk can provide tremendous improvement.

Just having more data is unlikely to affect performance unless the tables are used in queries. Is the data redundant within a table or across tables? Having large tables that never get used is messy, but has minimal impact on performance. It is imaginable that if you have zillions of unnecessary tables, then then compiling queries could start to take more time.

How to find out which tables use reference partitioning from the Oracle data dictionary

To get a list of partitioned tables that have at least one referencing partitioned child table the [dba][all][user]_part_tables and [dba][all][user]_constrains data dictionary views, depending on the privileges granted, can be queried:

create table tb_part_parent(
  col  number primary key,
  col2 number
)
partition by range (col2) (
  partition part_1 values less than (100),
  partition part_2 values less than (300),
  partition part_3 values less than (500)
)

create table tb_part_child(
  col  number not null,
  col2 number,
  constraint fk_parent_1 foreign key(col) references tb_part_parent(col)
)partition by reference (fk_parent_1)

The query:

select t.table_name
  from user_constraints t
 where t.constraint_name in ( select w.r_constraint_name
                                from user_constraints w
                                join user_part_tables q
                                  on (q.table_name = w.table_name and
                                      q.ref_ptn_constraint_name = w.constraint_name)
                              )

Result:

TABLE_NAME
-----------------
TB_PART_PARENT

Best Answer

Related Solutions

Mysql – How database size affects performance: Theory vs reality

How to find out which tables use reference partitioning from the Oracle data dictionary

Related Question