I'm working on a project that will result in an SQLite database of about 6 GB of text content (encoded UTF-8). The text will be diverse: it will have a great deal of plain text writing, but also a significant amount of special characters (tildes, backticks, section symbols, mdash's, endash's etc.). There will also be math formulas.
It looks like .import
will be the load method.
Question: What could I use as an .separator
value that won't be in my text?
I've grepped the future text with a few ideas and have not identified a separator that isn't in the actual content.
I suppose I could escape whatever separators may be in the text. But I prefer to avoid that option if I can.
Best Answer
I was just about to suggest using a multi-character separator or a character from a foreign language (e.g.
日
) but.separator
does not allow multi-character strings or even a multi-byte characters.I would use a multi-character separator with very low probability to be in the text to import and then parse the document using a custom script. Here is an example in Python3 using
$$$$$$
as a separator, but consider strings like[ŠĐć~^˘°˛˙€]
if needed.import.csv
sqliteimport.py