Nice way to SQL query multiple CSV files

csvdatabasesql

I was looking for a way to traverse CSV files as relational database tables.

I did some research as nothing I found fit my bill of requirements in its entirety. I found several partially good options, namely:

  1. termsql – which takes stdin or a file and allows some sql on it – but sets up only one "table"
  2. csv2sqlite – which is very promising given it allows for potentially more sql goodness than termsql – but still only one "table"
  3. this ULSE question – which describes how one would implement set operations with unix file traverse commands – promising and a possible starting point

It is possible, and very straightforward to traverse and perform some database-like operations on a single csv/text file (column sums, averages, min, max, subsets, etc), but not on two files, with some connection between them.
It is also possible to import the files to a temp DB for querying, and I have done this, although not as practical as I would like.

TL;DR – I basically would like a convenient way to do quick and dirty sql joins on csv files. Not looking for a full fledged text based RDBMS, but just a nicer way to do some analysis on csv RDBMS extracts.

example:

sqlthingy -i tbl1.csv tbl2.csv -o 'select 1,2,3 from tbl1, tbl2 where tbl1.1 = tbl2.1'

This seems like an interesting enough problem that I could devote some time on, but I'd like to know if it exists already.

Best Answer

Take a look at fsql (Perl), or csvkit (Python). They both have various problems and limitations, but they are often fine for "small" data. And, of course, you can always fallback to a proper database when they aren't enough.

Related Question