PostgreSQL Database Design – Many-to-Many Table with Multiple Same-Column References

database-designpostgresql

I'm designing a document translation DB and struggling to connect the non-generic many-to-many table.

Each document has a title and summary.

Translated Document table:

CREATE table translated_documents (
  document_id         bigint references documents (id),
  source_language_id  int not null,
  source_title_hash   uuid, -- md5 hash cast to uuid
  source_summary_hash uuid,
  PRIMARY KEY (document_id)
);


CREATE TABLE translations (
  content_hash            uuid not null, -- holds either source_title_hash or source_summary_hash
  translation_language_id int  not null,
  content_translation     text not null,
  PRIMARY KEY (content_hash, translation_language_id)
);

An attempt at a many-to-many table, trying to reference the content_hash column twice.

CREATE TABLE translated_documents_translations (
  document_id         bigint references translated_documents (document_id),
  source_title_hash   uuid not null references translations (content_hash),
  source_summary_hash uuid not null references translations (content_hash),
  PRIMARY KEY (source_title_hash),
  CONSTRAINT same_hash_not_allowed CHECK (source_title_hash <> source_summary_hash)
);

My thinking is that documents can be translated into many languages, and translations can work for many documents with the same title or summary. Is there a better way to set this up?

Best Answer

I think unless you have to save space or share a lot of translations you could just do one-to-many mapping documents -> document_translations

CREATE TABLE document_translations
(
    document_id BIGINT NOT NULL,
    language_id INT NOT NULL,
    title TEXT,
    summary TEXT,
    CONSTRAINT document_translations_pk PRIMARY KEY (language_id, document_id)
);

Then you could use external translation memory to help with translations.

If you want to share translations then I don't think you need translated_documents_translations as many-to-many table. Just use translated_documents like this:

CREATE TABLE translated_documents
(
  document_id BIGINT NOT NULL
    CONSTRAINT translated_documents_pkey
    PRIMARY KEY
    CONSTRAINT translated_documents_documents_id_fk
    REFERENCES documents,
  language_id INTEGER,
  title UUID,
  summary UUID,
  CONSTRAINT translated_documents_title_fk
  FOREIGN KEY (title, language_id) REFERENCES translations,
  CONSTRAINT translated_documents_summary_fk
  FOREIGN KEY (summary, language_id) REFERENCES translations
);

CREATE TABLE translations
(
  content_hash UUID NOT NULL,
  language_id INTEGER NOT NULL,
  translation TEXT,
  CONSTRAINT translations_content_hash_language_id_pk
  PRIMARY KEY (content_hash, language_id)
);

And JOIN documents -> documents_translations -> translations