Group by Continuous Days in MySQL – How to Guide

group byMySQL

I have table where I am storing some sequential data that indicate occurrences of some event given day.

id     from_id        to_id   date
1      2              1       2014-08-01
2      2              1       2014-08-02

3      2              1       2014-08-04
4      2              1       2014-08-05
6      2              1       2014-08-06

Sometimes event hasn't got place. Then I would like to get min and max from every uninterrupted sequences of records by data. According to example data it should return

from_id        to_id   min            max
2              1       2014-08-01     2014-08-02
2              1       2014-08-04     2014-08-06

If I would do simple group by with min and max

Select from_id, to_id, min(date), max(date) from events group by from_id, to_id

it would return

from_id        to_id   min            max
2              1       2014-08-01     2014-08-06

How can I do that in MySQL?

Best Answer

SELECT from_id,to_id,
  CONCAT_WS(' - ',
    MIN(dt),
    CASE WHEN MAX(dt)>MIN(dt) THEN MAX(dt) END
  ) As intervals
FROM (
  SELECT from_id,to_id,
    CASE WHEN dt=@last_ci+INTERVAL 1 DAY THEN @n ELSE @n:=@n+1 END AS g,
    @last_ci := dt As dt
  FROM
    t, (SELECT @n:=0) r
  ORDER BY
    dt
) s
GROUP BY
  g

FIDDLE

Related Solutions

Mysql – How to create a user-defined aggregate function in MySQL (redux)

OK, it's almost working; the only thing that is wrong is I evidently don't know how to coerce the constant argument args->args[1] to (any) type of int value. If I code target_depth to a fixed integer, the program works. It's only when I try to read it in from the query that I get the "ERROR 2013 (HY000): Lost connection to MySQL server during query" error. Any help with reading in the target depth for UDF_SMALL (2nd lowest value, 3rd lowest value, etc.) would be greatly appreciated.

/******************************************************************************
 ** udf_small.c
 **
 ** This MySQL user-defined function (UDF) sorts through an aggregate dataset, 
 ** then returns the ith smallest element each group in the GROUP BY statement.
 **
 ** Author:        Jeffrey Rolland
 ** Creation Date: 12/07/2014
 ** Modifications: None
 ******************************************************************************/

/******************************************************************************
** A dynamically loadable file should be compiled shared.
** (something like: gcc -shared -o my_func.so -I /usr/includes/mysql/ my_func.c).
** You can easily get all switches right by doing:
** cd sql ; make udf_example.o
** Take the compile line that make writes, remove the '-c' near the end of
** the line and add -shared -o udf_example.so to the end of the compile line.
** The resulting library (udf_example.so) should be copied to some dir
** searched by ld. (/usr/lib/mysql/plugin/ ?)
** If you are using gcc, then you should be able to create the udf_example.so
** by simply doing 'make udf_example.so'.
**
** After the library is made one must notify mysqld about the new
** functions with the commands:
**
** CREATE FUNCTION metaphon RETURNS STRING SONAME "udf_example.so";
** CREATE FUNCTION myfunc_double RETURNS REAL SONAME "udf_example.so";
** CREATE FUNCTION myfunc_int RETURNS INTEGER SONAME "udf_example.so";
** CREATE FUNCTION sequence RETURNS INTEGER SONAME "udf_example.so";
** CREATE FUNCTION lookup RETURNS STRING SONAME "udf_example.so";
** CREATE FUNCTION reverse_lookup RETURNS STRING SONAME "udf_example.so";
** CREATE AGGREGATE FUNCTION avgcost RETURNS REAL SONAME "udf_example.so";
** CREATE FUNCTION myfunc_argument_name RETURNS STRING SONAME "udf_example.so";
**
** After this the functions will work exactly like native MySQL functions.
** Functions should be created only once.
**
** The functions can be deleted by:
**
** DROP FUNCTION metaphon;
** DROP FUNCTION myfunc_double;
** DROP FUNCTION myfunc_int;
** DROP FUNCTION lookup;
** DROP FUNCTION reverse_lookup;
** DROP FUNCTION avgcost;
** DROP FUNCTION myfunc_argument_name;
**
** The CREATE FUNCTION and DROP FUNCTION update the func@mysql table. All
** Active function will be reloaded on every restart of server
** (if --skip-grant-tables is not given)
**
** If you ge problems with undefined symbols when loading the shared
** library, you should verify that mysqld is compiled with the -rdynamic
** option.
**
** If you can't get AGGREGATES to work, check that you have the column
** 'type' in the mysql.func table. If not, run 'mysql_upgrade'.
**
**************************************************************************/

/*************************************************************************
** Syntax for the new aggregate commands are:
** CREATE AGGREGATE FUNCTION <function_name> RETURNS {STRING|REAL|INTEGER}
** SONAME <name_of_shared_library>
**
** Syntax for avgcost: AVGCOST( t.quantity, t.price )
** with t.quantity=integer, t.price=double
**
** (This example is provided by Andreas F. Bobak <bobak@relog.ch>)
**************************************************************************/

#ifdef STANDARD
    /* STANDARD is defined, don't use any mysql functions */
    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    #ifdef __WIN__
        typedef unsigned __int64 ulonglong; /* Microsofts 64 bit types */
        typedef __int64 longlong;
    #else
        typedef unsigned long long ulonglong;
        typedef long long longlong;
    #endif /*__WIN__*/
#else
    #include <my_global.h>
    #include <my_sys.h>
    #if defined(MYSQL_SERVER)
        #include <m_string.h>   /* To get strmov() */
    #else
        /* when compiled as standalone */
        #include <string.h>
        #define strmov(a,b) stpcpy(a,b)
        #define bzero(a,b) memset(a,0,b)
    #endif
#endif
#include <mysql.h>
#include <ctype.h>
#ifdef _WIN32
    /* inet_aton needs winsock library */
    #pragma comment(lib, "ws2_32")
#endif
#ifdef HAVE_DLOPEN
    #if !defined(HAVE_GETHOSTBYADDR_R) || !defined(HAVE_SOLARIS_STYLE_GETHOST)
        static pthread_mutex_t LOCK_hostname;
    #endif
#endif
#include <stdbool.h>

my_bool udf_small_init( UDF_INIT* initid, UDF_ARGS* args, char* message );
void udf_small_deinit( UDF_INIT* initid );
void udf_small_reset( UDF_INIT* initid, UDF_ARGS* args, char* is_null, char *error );
void udf_small_clear( UDF_INIT* initid, UDF_ARGS* args, char* is_null, char *error );
void udf_small_add( UDF_INIT* initid, UDF_ARGS* args, char* is_null, char *error );
double udf_small( UDF_INIT* initid, UDF_ARGS* args, char* is_null, char *error );

struct SortArray
{
    ulonglong length;
    double *data;
};

/*************************************************************************
** Example of init function
** Arguments:
** my_bool maybe_null 1 if function can return NULL
** Default value is 1 if any of the arguments
** is declared maybe_null
**
**
** initid Points to a structure that the init function should fill.
** This argument is given to all other functions.
**
** unsigned int decimals Number of decimals.
** Default value is max decimals in any of the
** arguments.
**
** unsigned int max_length Length of string result.
** The default value for integer functions is 21
** The default value for real functions is 13+
** default number of decimals.
** The default value for string functions is
** the longest string argument.
**
** char *ptr; A pointer that the function can use.
**
**
** args Points to a structure which contains:
** unsigned int arg_count Number of arguments
**
** enum Item_result *arg_type Types for each argument.
** Types are STRING_RESULT, REAL_RESULT
** and INT_RESULT.
**
** char **args Pointer to constant arguments.
** Contains 0 for not constant argument.
**
** unsigned long *lengths; max string length for each argument
** char *maybe_null Information of which arguments
** may be NULL
**
**
** message Error message that should be passed to the user on fail.
** The message buffer is MYSQL_ERRMSG_SIZE big, but one should
** try to keep the error message less than 80 bytes long!
**
** This function should return 1 if something goes wrong. In this case
** message should contain something usefull!
**************************************************************************/
my_bool udf_small_init( UDF_INIT* initid, UDF_ARGS* args, char* message )
{
    struct SortArray *sort = NULL;
    ulonglong target_depth;
    ulonglong i;

    if (args->arg_count != 2)
    {
        strcpy(message, "Wrong number of arguments: SMALL() requires two arguments");
        return 1;
    }
    if ((args->arg_type[0] != REAL_RESULT) || (args->arg_type[1] != INT_RESULT) )
    {
        strcpy(message, "Wrong argument type: SMALL() requires an REAL and a INT");
        return 1;
    }
    if (*((longlong*)args->args[1]) < 1)
    {
        strcpy(message, "Wrong argument value: SMALL() requires second parameter that is positive");
        return 1;
    }

    initid->maybe_null = 0; /* The result may be null */
    initid->decimals = 10; /* We want 10 decimals in the result */
    /* initid->max_length = 10; */ /* 10 digits + . + 10 decimals */

    target_depth = *((longlong*)args->args[1]);

    sort = malloc((sizeof(struct SortArray)));

    if(sort == NULL)
    {
        strcpy(message, "Couldn't allocate memory");
        return 1;
    }   

    initid->ptr = (char*)sort;

    sort->length = 0;   

    sort->data = malloc((target_depth)*(sizeof(double)));

    if(sort->data == NULL)
    {
        strcpy(message, "Couldn't allocate memory");
        return 1;
    }   

    for(i = 0; i < target_depth; i++)
    {
        sort->data[i] = 999999999;
    }

    return 0;
}

/****************************************************************************
** Deinit function. This should free all resources allocated by
** this function.
**
** Arguments:
** initid Return value from xxxx_init
****************************************************************************/
void udf_small_deinit( UDF_INIT* initid )
{
    free(initid->ptr);  
}

/****************************************************************************
** Small Aggregate Function.
**
** There are 3 extra functions for an aggregate function, xxx_reset, 
** xxx_clear, and xxx_add
****************************************************************************/

/****************************************************************************
** xxx_reset
**
** Arguments:
** initid Structure filled by xxx_init
**
** args The same structure as to xxx_init. This structure
** contains values for all parameters.
** Note that the functions MUST check and convert all
** to the type it wants! Null values are represented by
** a NULL pointer
**
** is_null If the result is null, one should store 1 here.
**
** message Error message that should be passed to the user on fail.
** The message buffer is MYSQL_ERRMSG_SIZE big, but one should
** try to keep the error message less than 80 bytes long!.
**
** xxx_reset gets done every time we begin processing the records for a new 
** group. It calls xxx_clear and xxx_add
****************************************************************************/

/* This is only for MySQL 4.0 compability */
void udf_small_reset(UDF_INIT* initid, UDF_ARGS* args, char* is_null, char* message)
{
    udf_small_clear(initid, args, is_null, message);
    udf_small_add(initid, args, is_null, message);
}

/****************************************************************************
** xxx_clear
**
** Arguments:
** initid Structure filled by xxx_init
**
** args The same structure as to xxx_init. This structure
** contains values for all parameters.
** Note that the functions MUST check and convert all
** to the type it wants! Null values are represented by
** a NULL pointer
**
** is_null If the result is null, one should store 1 here.
**
** message Error message that should be passed to the user on fail.
** The message buffer is MYSQL_ERRMSG_SIZE big, but one should
** try to keep the error message less than 80 bytes long!.
**
** xxx_clear resets the processing variables to their initial values for a 
** new group
****************************************************************************/

/* This is needed to get things to work in MySQL 4.1.1 and above */
void udf_small_clear(UDF_INIT* initid, UDF_ARGS* args, char* is_null __attribute__((unused)), char* message __attribute__((unused)))
{
    struct SortArray *sort = NULL;
    ulonglong target_depth;
    ulonglong i;

    sort = (struct SortArray*)initid->ptr;

    target_depth = *((longlong*)args->args[1]);

    /* if(sort == NULL)
    {
        strcpy(message, "Initid Pointer is NULL");
        exit(1);
    }

    if(sort->data == NULL)
    {
        strcpy(message, "Couldn't allocate memory");
        exit(1);
    } */

    sort->length = 0;

    for(i = 0; i < target_depth; i++)
    {
        sort->data[i] = 999999999;
    }
}

/****************************************************************************
** xxx_add
**
** Arguments:
** initid Structure filled by xxx_init
**
** args The same structure as to xxx_init. This structure
** contains values for all parameters.
** Note that the functions MUST check and convert all
** to the type it wants! Null values are represented by
** a NULL pointer
**
** is_null If the result is null, one should store 1 here.
**
** message Error message that should be passed to the user on fail.
** The message buffer is MYSQL_ERRMSG_SIZE big, but one should
** try to keep the error message less than 80 bytes long!.
**
** xxx_add is the main processing workhorse of a anggregate UDF. It processes
** each new record in the group.
****************************************************************************/

void udf_small_add(UDF_INIT* initid, UDF_ARGS* args, char* is_null __attribute__((unused)), char* message __attribute__((unused)))
{
    if (args->args[0] && args->args[1])
    {
        struct SortArray *sort = NULL;
        ulonglong target_depth;
        double col_value;
        ulonglong counter;
        ulonglong i, j;
        bool done;

        sort = (struct SortArray*)initid->ptr;

        target_depth = *((longlong*)args->args[1]); 

        col_value = *((double*)args->args[0]);  

        /* if(sort == NULL)
        {
            strcpy(message, "Initid pointer is NULL");
            exit(1);
        } */

        i = target_depth;
        counter = 1;
        done = FALSE;

        /* sort->data[0] = col_value + 1.0;
        sort->data[1] = col_value;

        col_value = 0.05; */

        while ((!(done)) && (i > 0))
        {
            if(col_value <= sort->data[i-1])
            {
                for(j = 0; j < i-1; j++)
                {
                    sort->data[j] = sort->data[j+1];
                }
                sort->data[i-1] = col_value;
                done = TRUE;
            }
            else
            {
                i--;
                counter++;
            }
        }

        /* if(counter >= sort->length)
        {
            sort->length = counter;
        } */

    }
}

/***************************************************************************
** UDF double function.
**
** Arguments:
** initid Structure filled by xxx_init
**
** args The same structure as to xxx_init. This structure
** contains values for all parameters.
** Note that the functions MUST check and convert all
** to the type it wants! Null values are represented by
** a NULL pointer
**
** is_null If the result is null, one should store 1 here.
**
** error If something goes fatally wrong one should store 1 here.
**
** This function should return a double. It returns a value at the end of 
** each group.
***************************************************************************/

double udf_small( UDF_INIT* initid, UDF_ARGS* args __attribute__((unused)), char* is_null, char* error __attribute__((unused)))
{
    struct SortArray *sort = NULL;
    ulonglong target_depth;

    sort = (struct SortArray*)initid->ptr;
    target_depth = *((longlong*)args->args[1]);

    sort->length = target_depth; /* Kill me once working */

    if(sort != NULL)
    {
        if(sort->length == target_depth)
        {
            *is_null = 0;
            return sort->data[0];
        }   
    }
    else
    {
        *is_null = 1;
        return  10.0;
    }
}

MySQL Optimization – Aggregation and Grouping Techniques

Using temporary and filesort are not as evil as everyone makes them out to be.

Do not split a DATE into year, month, and day columns; it will cause more pain than it is worth.

I prefer (in your specific case) to do GROUP BY LEFT(c.date, 7), although it may not speed up anything.

"Summary Tables" is the way to go for that type of query, assuming you have large tables. More discussion in my blog. You would add one (maybe more) row to the summary table each night, then a query against that table could summarize any day range, including your month.

Do not store AVG(...) in a Summary table; it is probably more 'correct' to store SUM(foo) AS sum_foo and COUNT(*) AS foo_ct, then have the report

SUM(sum_foo) / SUM(foo_ct) AS 'Average foo'

Probably the optimal summary table would have

PRIMARY KEY(goal_id, date)

and have sum_conversions calculated from SUM(c.conversions), plus anything else relevant. Often, one summary table can handle a small set of standard 'reports'.

Best Answer

Related Solutions

Mysql – How to create a user-defined aggregate function in MySQL (redux)

MySQL Optimization – Aggregation and Grouping Techniques

Related Question