Mysql – Extract query fingerprints and tables from large log files using pt-query-digest

logsMySQL

I tried to parse a large log file using pt-query-digest but it gives "Out of memory!" when I try to read the file. I have 3GB RAM, 40GB empty space on HDD, the file is 20GB. I did not found anything in the documentation about memory.

I reduce the command at:

pt-query-digest --group-by fingerprint --print --no-report /path/to/largeLogFile.log
pt-query-digest --table-access /path/to/largeLogFile.log

I have 2 questions:

  1. How can I extract the query fingerprints from a large log file generated by general_log ?

  2. How can I extract all the tables that were used ?

Edit:
Tried @DTest –sample option (with 2 and 1), but the result is the same. Just before it reaches 50% memory in htop it dies every time (see screnshoot): enter image description here

I did an strace just before it dies I get:

read(3, "phone from teams as t ri"..., 4096) = 4096
mremap(0x5b27a000, 1543499776, 1543503872, MREMAP_MAYMOVE) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 1543503872, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
brk(0x64c2d000)                         = 0x8c2e000
mmap2(NULL, 1543634944, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap2(NULL, 2097152, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x5b07a000
munmap(0x5b07a000, 548864)              = 0
munmap(0x5b200000, 499712)              = 0
mprotect(0x5b100000, 135168, PROT_READ|PROT_WRITE) = 0
mmap2(NULL, 1543503872, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
write(2, "Out of memory!\n", 15Out of memory!
)        = 15
munmap(0x5b27a000, 1543499776)          = 0
... [output removed] ...

The only option I can think so far is splitting the file "bite size" pieces (let's say 20 files of 1GB each), digest them and merge the result.

Best Answer

I would try to use the --sample option:

pt-query-digest --sample 2 --print --no-report /path/to/largeLogFile.log

This would filter out all but the first 2 occurrences of a query fingerprint. --no-report removes the memory and cpu overhead of aggregating.