Postgresql – How to store LOB-image or PDF in Postgresql and what is the impact of having in Postgresql

performancepostgresqlpostgresql-9.6postgresql-performance

Please guide me to store LOB-image or PDF in Postgresql and let me know what is the impact of storing these types of objects on Postgresql database.

Any help will be appreciated.

Best Answer

BYTEA is a good choice here.

The BYTEA data type allows storage of binary strings.

It stores a LOB within the table, respectively using TOAST. It is thus limited to 1 GB The storage is octal and allows non printable characters (in contrast to character strings which don't). The input/output format is HEX (as of PostgreSQL 9.0). Notes:

BYTEA comes close to the SQL standard binary string type 'BLOB'. The functions and operators provided by BYTEA are mostly the same, while HEX input format of BYTEA is different. BYTEA is slower for lengths >20 MB than the LO facility (it has no random accees).

More details can be found here

LC_COLLATE

LC_COLLATE affects comparisons between strings. In practice, the most visible effect is the sort order. LC_COLLATE='C' (or POSIX which is a synonym) means that it's the byte order that drives comparisons, whereas a locale in the language_REGION form means that cultural rules will drive the comparisons.

An example with french names, executed from inside an UTF-8 database:

select firstname from (values ('bernard'), ('bérénice'), ('béatrice'), ('boris'))
 AS l(firstname)
order by firstname collate "fr_FR";

Result:

 firstname 
-----------
 béatrice
 bérénice
 bernard
 boris

béatrice comes before boris, because the accented E compares against O as if it was non-accented. It's a cultural rule.

This differs from what happens with a C locale:

select firstname from (values ('bernard'), ('bérénice'), ('béatrice'), ('boris')) 
 AS l(firstname)
order by firstname collate "C";

Result:

 firstname 
-----------
 bernard
 boris
 béatrice
 bérénice

Now the names with accented E are pushed at the end of the list. The byte representation of é in UTF-8 is the hexadecimal C3 A9 and for o it's 6f. c3 is greater than 6f so under the C locale, 'béatrice' > 'boris'.

It's not just accents. There a more complex rules with hyphenation, punctuation, and weird characters like œ. Weird cultural rules are to be expected in every locale.

Now if the strings to compare happen to mix different languages, as when having a firstname column for people from all other the world, it might be that any particular locale should not dominate, anyway, because different alphabets for different languages have not been designed to be sorted against each other.

In this case C is a rational choice, and it has the advantage of being faster, because nothing can beat pure byte comparisons.

LC_CTYPE

Having LC_CTYPE set to 'C' implies that C functions like isupper(c) or tolower(c) give expected results only for characters in the US-ASCII range (that is, up to codepoint 0x7F in Unicode).

Because SQL functions like upper(), lower() or initcap are implemented in Postgres on top of these libc functions, they're affected by this as soon as there are non US-ASCII characters in strings.

Example:

test=> show lc_ctype;
  lc_ctype   
-------------
 fr_FR.UTF-8
(1 row)

-- Good result
test=> select initcap('élysée');
 initcap 
---------
 Élysée
(1 row)

-- Wrong result
-- collate "C" is the same as if the db has been created with lc_ctype='C'
test=> select initcap('élysée' collate "C");
 initcap 
---------
 éLyséE
(1 row)

For the C locale, é is treated as an uncategorizable character.

Similarly wrong results are also obtained with regular expressions:

test=> select 'élysée' ~ '^\w+$';
 ?column? 
----------
 t
(1 row)

test=> select 'élysée' COLLATE "C" ~ '^\w+$';
 ?column? 
----------
 f
(1 row)

PostgreSQL – How to Best Store a Timestamp

Store timestamps as timestamp, or rather timestamptz (timestamp with time zone) since you are dealing with multiple time zones. That enforces valid data and is typically most efficient. Be sure to understand the data type, there are some misconceptions floating around:

To address your concern:

passing a correctly formatted timestamp is more complex than a simple number

You can pass and retrieve a UNIX epoch either way if you prefer:

SELECT to_timestamp(1437346800)
     , extract(epoch FROM timestamptz '2015-07-20 01:00+02');

If you want to store the current timestamp with writes to the DB, use a timestamptz column with default value now(). The system time on the DB server is typically much more reliable and consistent than multiple clients handing in their respective notion of what time it is.
For INSERT it can be as simple as:

CREATE TABLE foo (
  ... -- other columns
, created_at timestamptz NOT NULL DEFAULT now()
);

And just don't write to that column. It's filled in automatically.

Best Answer

Related Solutions

PostgreSQL – Impact of LC_CTYPE on Database

LC_COLLATE

LC_CTYPE

PostgreSQL – How to Best Store a Timestamp

Related Question