If you only have a few variables I would consider keeping separate boolean
columns.
- Indexing is easy. In particular, indexes on expressions are easy.
- Conditions for queries and partial indexing are easy to write and read and meaningful.
- A boolean column occupies 1 byte. For only a few variables this occupies the least space.
- Unlike the other options boolean columns allow
NULL
values for individual bits if you should need that. You can always define columns NOT NULL
if you don't.
Optimizing storage
If you have more than a hand full variables but less than 33, an integer
column may serve you best. (Or a bigint
for up to 64 variables.)
- Occupies 4 bytes on disk.
- Very fast indexing for exact matches (
=
operator).
- Handling individual values may be slower / less convenient than with
bit string
or boolean
.
With even more variables, or if you want to manipulate the values a lot, or if you don't have huge tables and disk space / RAM is no issue, or if you are not sure what to pick, I would consider bit(n)
or bit varying(n)
.
Examples
For just 3 bits of information, individual boolean
columns get by with 3 bytes, an integer
needs 4 bytes and a bit string
6 bytes (5 + 1).
For 32 bits of information, an integer
still needs 4 bytes, a bit string
occupies 9 bytes for the same (5 + 4) and boolean
columns occupy 32 bytes.
Further reading
The file names are 24 hex digits long, broken into 3 fields of 8 digits each. The first is the timeline, which you probably don't need to worry about here. The other two are called the "segment" and the "file". They are really a single counter, the "file" rolls over by incrementing the "segment" by one and going back to zero.
Oddly, only the last two digits of the "file" chunk are ever non-zero. (This is apparently so that the offset into a logical 'segment' fits in a 32 bit uint, which was important at some point in the distant past)
In 9.1 and 9.2, the "file" skips FF. So from FE, it rolls back over to 00 while incrementing the segment number. In 9.3, it switches so that FF is used. So the sequence for your version would go:
0000000100000000000000FD
0000000100000000000000FE
000000010000000100000000
000000010000000100000001
...
00000001000000BE000000F9
00000001000000BE000000FA
00000001000000BE000000FB
00000001000000BE000000FC
00000001000000BE000000FD
00000001000000BE000000FE
00000001000000BF00000000
00000001000000BF00000001
00000001000000BF00000002
00000001000000BF00000003
00000001000000BF00000004
Getting the files from pg_xlog is tricky, because those files get recycled (i.e. renamed to have the name of some future file-name, then overwritten once it gets around to needing that file). This means that the timestamps on the files are all screwy. A recycled log file will have the name of some future file, but the contents and timestamp of some past file.
If a file was recycled before you had a chance to copy it, then your backup has failed and you need to try again with a higher wal_keep_segments.
And of course if you copy the "current" file before it was finalized, it will be missing the changes that occurred after the point you copied it.
It is much easier to get them from the wal archive, because that doesn't have the recycling problem and because they don't show up at all until they are finalized.
Best Answer
Turn the sentence into an array, then pick the first three elements and concatenate them back to a single string:
alternatively, just use
split_part()
three times:split_part()
has the disadvantage, that you can't properly deal with multiple spaces between the words.