PostgreSQL – String Comparison to Text Column Performance

postgresql

I have a Postgresql database with a column type of text. Coming from a SQL Server background, is this equivalent to the (n)varchar(max) type?

My specific example/reason for asking this question, is that I have a table with a column of type text in which I would like to store unique values. The table is updated according to regular CSV imports, meaning that for every row in the CSV the the text column is checked for an existing entry (a column-value in the CSV), and if none is found then that value is inserted into the table.

My understanding is that this could mean checking thousands (or maybe hundreds of thousands) of text values against other text values. I imagine this to be incredibly inneficient. Is this the case?

Best Answer

Yes, text is (roughly) equivalent to varchar(max)

Comparing text values is not less efficient than comparing varchar values in Postgres as under the hood they are absolutely identical. So the efficiency of the comparison is related to the length of the values. I don't expect that to be any slower than your current implementation using varchar(max) in SQL Server.

If you want to enforce uniqueness on a column, create a unique index on it. Then you can use insert ... on conflict do nothing to efficiently insert new values and at the same time validating that they are unique.

However: there is a technical limit on how long an index entry is allowed to be which is roughly 2700 byte. You didn't mention how long your values are, but the index might not work for you then.

Consider also this answer https://dba.stackexchange.com/a/69164

Related Solutions

PostgreSQL Text Pattern Ops Index – Why Index text_pattern_ops on a Text Column?

The documentation often gives you an answer to such questions. Like in this case, too:

The operator classes text_pattern_ops, varchar_pattern_ops, and bpchar_pattern_ops support B-tree indexes on the types text, varchar, and char respectively. The difference from the default operator classes is that the values are compared strictly character by character rather than according to the locale-specific collation rules. This makes these operator classes suitable for use by queries involving pattern matching expressions (LIKE or POSIX regular expressions) when the database does not use the standard "C" locale. As an example, you might index a varchar column like this:
CREATE INDEX test_index ON test_table (col varchar_pattern_ops);
Note that you should also create an index with the default operator class if you want queries involving ordinary <, <=, >, or >= comparisons to use an index. Such queries cannot use the xxx_pattern_ops operator classes. (Ordinary equality comparisons can use these operator classes, however.) It is possible to create multiple indexes on the same column with different operator classes.

The documentation goes on to say:

If you do use the C locale, you do not need the xxx_pattern_ops operator classes, because an index with the default operator class is usable for pattern-matching queries in the C locale.

You can check your locale as follows (it is likely to be UTF8 rather than "C"):

postgres=> show lc_collate;
 lc_collate
-------------
 en_GB.UTF-8

Postgresql – See entire string from a lengthy text field in pgAdmin

Open pgAdminIII,
Select your database,
Use the Magnifying Glass button, i.e. "Execute arbitrary SQL queries"
Instead of running your query using the green triangle "Execute Query", choose the button two to the right of that - i.e. "Execute Query, write result to file".
Choose your destination, then you can view arbitrary length text in your chosen file destination.
You can also use psql which will show text, but will scroll if that's any use. Also, there would be the possibility of scripting your query output using different delimiters.

Best Answer

Related Solutions

PostgreSQL Text Pattern Ops Index – Why Index text_pattern_ops on a Text Column?

Postgresql – See entire string from a lengthy text field in pgAdmin

Related Question