Postgresql – How to change the mount point for a column in postgresql table

postgresql

This is my scenario

I have a table t with following columns

id  int
link  varchar(500)
name  varchar(20)
sal   text

In the link column we will store the location for the pdf files(mount point)

Ex: link /ABC/ABC_DATA/Mydata/Test/a1.pdf

there will be one link for every id

Application Users have changed their pdf file location from "/ABC/ABC_DATA/Test/a1.pdf" to "/CBF/CBF_DATA/Documents/Test_DATA/a1.pdf"

Now how can i update the table with new location for all the rows. Table contains nearly 10,000 rows

Note:id and link are related every id contains unique link.

Best Answer

Maybe I'm missing something, but it sounds as if that is a simple case of replacing the value:

update the_table
   set link = replace(link, '/ABC/ABC_DATA/Test/', '/CBF/CBF_DATA/Documents/Test_DATA/');

Related Solutions

Postgresql – Restore postgres data/tablespace to new tablespace (at new mount point)

I don't think you can do what you want here - certainly not without changes to PostgreSQL.

Tablespaces are not an independent datastore, and cannot be exchanged between different PostgreSQL instances. Making them so would be a huge change to PostgreSQL's core, with significant impacts on transactional DDL support, etc.

See this related answer I wrote on the topic, explaining some of the issues with attaching a tablespace to a different PostgreSQL instance: How do I access a old saved tablespace after reinstalling postgres?

If you were to create the entire database in a different tablespace then you'd be able to avoid the issues with pg_class and TOAST tables ... but you'd still have to deal with the problems with the pg_clog, pg_xlog, transaction visibility, etc.

pg_xlog issues could potentially be solved if you were able to copy the tablespace at a point after a checkpoint, before any further writes were performed, so the tablespace was "clean". PostgreSQL doesn't currently offer any assistance to do this, but I think it'd be possible to add it with moderate code changes (at the cost of a stall in the production DB when you flushed it for the checkpoint and snapshotted it) by taking an exclusive lock on the internal xlog insert log. Possibly do-able even as an extension function.

The pg_clog and pg_control problem is a lot harder. Transaction IDs are global across the entire PostgreSQL instance. So your tablespace (or whatever other collection of tables less than an entire instance) will have transaction IDs in the future, and/or with different committed/rolled-back statuses in the local pg_clog to what they had in the original pg_clog. Badness ensues. The only way I can think of to deal with this is to VACUUM FREEZE all tables in the database, which will set their xmin and xmax to FrozenTransactionId, an xid defined in the source code that's always visible for all transactions. However, doing this without having unfrozen xids added in parallel requires that you take an EXCLUSIVE lock (blocking INSERT, UPDATE and DELETE) on all tables in the database for as long as it takes to VACUUM FREEZE and then snapshot the tablespace.

So. Maybe it's possible to do what you want, but there's no way it's possible without production impact. You'd have to basically rewrite PostgreSQL to store per-database pg_clog and pg_xlog, keep a per-database pg_control, and use per-database transaction IDs. Then add support for keeping these in a non-default tablespace. Once you'd done that, you'd have the exciting job of making this work with streaming replication and WAL-shipping based PITR, which currently relies on a single WAL stream as an ordered time-line of events. And you'd have to deal with the extra fsync() costs, etc, of per-DB WAL. Finally, you'd need to figure out how you were going to make this work with PostgreSQL's shared relations, like pg_database, pg_authid, etc, that live in the pg_global tablespace - they are transactional tables and participate in transactions along with tables in the current database, and you'd have to find a way to keep that working when you now had two separate transaction IDs (global and per-database) that had to commit or rollback atomically.

This is quite an interesting problem, and I suggest raising it on the pgsql-general mailing list. Link back here for context (and so people can explain all the things I got wrong in the above explanation).

Personally I'd really like to see PostgreSQL able to handle individual databases or tablespaces more independently, because I'd really like to be able to use replication to stream only a particular DB, or a subset of a DB. Right now that's just not possible, meaning you must do frustrating things like log-ship change data for high churn transaction tables and low-churn important archive tables together, or separate them into completely separate Pg instances. It doesn't look easy to do; the new logical replication and BDR feature should help, though.

Postgresql – Searching a big table for a few results without scanning the whole table

but using the "limit" keyword cant prevent the search over the whole set of data(is it correct?)

Correct; it'll just limit how many results are returned.

For some kinds of query it can also limit how many are scanned in the first place, but you can't rely on that in the general case.

does PostgreSQL database have to create this 1M search result rows temporarily in memory and then give me my needed 100 results?

Create, yes. In memory, not necessarily. It'll often spill to disk, or discard unneeded results as it goes so it only keeps the top 100. It depends on the details of the query.

or is there any way to tell PostgreSQL to stop searching as soon as it find 100 records?

Write a query where that is possible. You haven't shown the query, or schema, so this part is impossible to usefully answer.

In general, if you have an ORDER BY on some kind of search relevance field then the whole data set must be searched then filtered for the top-n results.

In short: "It depends". Specifically, it depends on the schema, available indexes, and on exactly what you're searching for and what results you want.

This:

SELECT id 
FROM mytable 
WHERE somefield > 100 
ORDER BY somefield DESC
LIMIT 100;

would only scan the needed rows if there was an index on mytable(somefield DESC).

This:

SELECT id, title
FROM mytable
WHERE title LIKE '%something%'
ORDER BY calculate_relevance(title, 'something')
LIMIT 100;

would always scan the whole table, both because an infix text pattern match (LIKE '%blah%') can't use a b-tree index, and because you can't create an expression index on a function taking a literal parameter like the case of the imaginary calculate_relevance function above.

So... it depends totally on what you're searching for and how.

Is this answer too hand-wavey and general for you? So's the question. Supply specifics, and you'll get more specific answers. A http://sqlfiddle.com/ of the schema with sample data, plus your PostgreSQL version, a table of the expected results, and a description of what you want to find is usually the minimum for this kind of thing.

Update: A few problems here.

There's no index on tag1 and/or tag2 so Pg must do a seqscan of the table, searching it until it finds 100 results. This is extremely inefficient.
There's no ORDER BY, so PostgreSQL can return whatever results it feels like for a LIMIT and return them in any order.
Your schema shows signs of being unnecessarily denormalized. tag1, tag2, etc usually suggest bad table design. I strongly recommend that you study relational database design and normalization.

In this specific case, you'd be able to avoid the full table scan if you CREATE INDEX con_tag1_tag2_idx ON con(tag1, tag2). But that'll only work for searches that use tag1 and optionally also tag2. It won't help (much) with searches for just tag2, searches for ref1, etc. (Thanks Erwin for the important note on multicolumn index use with the first column not included, it was a surprise to me).

Don't create lots of indexes. Each one uses disk space and takes disk I/O to insert, update and delete.

The answer is often proper normalization.

Overall I think you need to buy/borrow a couple of good introductory books on relational database design and basic to intermediate SQL, preferably PostgreSQL-focused, and do some study.

Best Answer

Related Solutions

Postgresql – Restore postgres data/tablespace to new tablespace (at new mount point)

Postgresql – Searching a big table for a few results without scanning the whole table

Related Question