Mysql – How to create a user-defined aggregate function in MySQL (redux)

functionsMySQL

OK, I'm probably going to get banned for asking the same question a second time, but I'm an incredible newbie here and I was told I don't have enough of a reputation to comment on the original question.

I agree with the OP's (Matt Fenwick's) comment to answer 5 in How do I create a user-defined aggregate function? in that it gives a fish for a day, but the real question is learning how to fish.

I need, among other things, a GROUP BY aggregate function like the MIN function, but on call SMALL(t.value, 2) it return the second smallest data element, on call SMALL(t.value, 3) it return the third smallest data element, and so on. (This actually works with LibreOffice Calc's SMALL function, BTW.)

The problem for my particular reason for asking this problem is I would like to convert my grading spreadsheet for a college course to a MySQL database, and I need to drop the two lowest attendance score, the two lowest homework scores, and the two lowest quiz scores for each students to compute their adjusted attendance percentage, adjusted homework average, and adjusted quiz average, among other adjustments to the data.

The tblQuizzes looks something like

mysql> select * from tblQuizzes LIMIT 12;
+------------+---------+-----------+
| StudentKey | QuizKey | QuizGrade |
+------------+---------+-----------+
|          1 |       1 |     0.123 |
|          2 |       1 |     0.456 |
|          3 |       1 |     0.789 |
|          4 |       1 |     0.890 |
|          5 |       1 |     0.123 |
|          6 |       1 |     1.000 |
|          1 |       2 |     0.789 |
|          2 |       2 |     0.123 |
|          3 |       2 |     0.456 |
|          4 |       2 |     0.789 |
|          5 |       2 |     0.123 |
|          6 |       2 |     1.000 |
+------------+---------+-----------+

I would need a query like

SELECT
tblQuizzes.StudentKey,
(SUM(tblQuizzes.QuizGrade) - MIN(tblQuizzes.QuizGrade) - UDF_SMALL(tblQuizzes.QuizGrade, 2))/(COUNT(tblQuizzes.QuizGrade) - 2) AS AdjQuizAverage
FROM
tblQuizzes
GROUP BY
tblQuizzes.StudentKey

to drop the lowest 2 quiz scores and average the remaining quizzes.

So, keeping in mind that I don't just want "a fish for a day" and know how to get the second lowest quiz score, but instead am interested in "learning how to fish" and write a MIN(table.value, offset) GROUP BY aggregate function (in C/C++ or PHP if necessary), can anyone provide some hints/complete answers with links and examples for this "fishing problem"?

Thanks much in advance for any assistance you can provide; I'm sorry if I seem rude in my "demands"/requests, I've just learned that vague requests result in workaround, "fish for a day" replies.

Best Answer

OK, it's almost working; the only thing that is wrong is I evidently don't know how to coerce the constant argument args->args[1] to (any) type of int value. If I code target_depth to a fixed integer, the program works. It's only when I try to read it in from the query that I get the "ERROR 2013 (HY000): Lost connection to MySQL server during query" error. Any help with reading in the target depth for UDF_SMALL (2nd lowest value, 3rd lowest value, etc.) would be greatly appreciated.

/******************************************************************************
 ** udf_small.c
 **
 ** This MySQL user-defined function (UDF) sorts through an aggregate dataset, 
 ** then returns the ith smallest element each group in the GROUP BY statement.
 **
 ** Author:        Jeffrey Rolland
 ** Creation Date: 12/07/2014
 ** Modifications: None
 ******************************************************************************/

/******************************************************************************
** A dynamically loadable file should be compiled shared.
** (something like: gcc -shared -o my_func.so -I /usr/includes/mysql/ my_func.c).
** You can easily get all switches right by doing:
** cd sql ; make udf_example.o
** Take the compile line that make writes, remove the '-c' near the end of
** the line and add -shared -o udf_example.so to the end of the compile line.
** The resulting library (udf_example.so) should be copied to some dir
** searched by ld. (/usr/lib/mysql/plugin/ ?)
** If you are using gcc, then you should be able to create the udf_example.so
** by simply doing 'make udf_example.so'.
**
** After the library is made one must notify mysqld about the new
** functions with the commands:
**
** CREATE FUNCTION metaphon RETURNS STRING SONAME "udf_example.so";
** CREATE FUNCTION myfunc_double RETURNS REAL SONAME "udf_example.so";
** CREATE FUNCTION myfunc_int RETURNS INTEGER SONAME "udf_example.so";
** CREATE FUNCTION sequence RETURNS INTEGER SONAME "udf_example.so";
** CREATE FUNCTION lookup RETURNS STRING SONAME "udf_example.so";
** CREATE FUNCTION reverse_lookup RETURNS STRING SONAME "udf_example.so";
** CREATE AGGREGATE FUNCTION avgcost RETURNS REAL SONAME "udf_example.so";
** CREATE FUNCTION myfunc_argument_name RETURNS STRING SONAME "udf_example.so";
**
** After this the functions will work exactly like native MySQL functions.
** Functions should be created only once.
**
** The functions can be deleted by:
**
** DROP FUNCTION metaphon;
** DROP FUNCTION myfunc_double;
** DROP FUNCTION myfunc_int;
** DROP FUNCTION lookup;
** DROP FUNCTION reverse_lookup;
** DROP FUNCTION avgcost;
** DROP FUNCTION myfunc_argument_name;
**
** The CREATE FUNCTION and DROP FUNCTION update the func@mysql table. All
** Active function will be reloaded on every restart of server
** (if --skip-grant-tables is not given)
**
** If you ge problems with undefined symbols when loading the shared
** library, you should verify that mysqld is compiled with the -rdynamic
** option.
**
** If you can't get AGGREGATES to work, check that you have the column
** 'type' in the mysql.func table. If not, run 'mysql_upgrade'.
**
**************************************************************************/

/*************************************************************************
** Syntax for the new aggregate commands are:
** CREATE AGGREGATE FUNCTION <function_name> RETURNS {STRING|REAL|INTEGER}
** SONAME <name_of_shared_library>
**
** Syntax for avgcost: AVGCOST( t.quantity, t.price )
** with t.quantity=integer, t.price=double
**
** (This example is provided by Andreas F. Bobak <bobak@relog.ch>)
**************************************************************************/

#ifdef STANDARD
    /* STANDARD is defined, don't use any mysql functions */
    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    #ifdef __WIN__
        typedef unsigned __int64 ulonglong; /* Microsofts 64 bit types */
        typedef __int64 longlong;
    #else
        typedef unsigned long long ulonglong;
        typedef long long longlong;
    #endif /*__WIN__*/
#else
    #include <my_global.h>
    #include <my_sys.h>
    #if defined(MYSQL_SERVER)
        #include <m_string.h>   /* To get strmov() */
    #else
        /* when compiled as standalone */
        #include <string.h>
        #define strmov(a,b) stpcpy(a,b)
        #define bzero(a,b) memset(a,0,b)
    #endif
#endif
#include <mysql.h>
#include <ctype.h>
#ifdef _WIN32
    /* inet_aton needs winsock library */
    #pragma comment(lib, "ws2_32")
#endif
#ifdef HAVE_DLOPEN
    #if !defined(HAVE_GETHOSTBYADDR_R) || !defined(HAVE_SOLARIS_STYLE_GETHOST)
        static pthread_mutex_t LOCK_hostname;
    #endif
#endif
#include <stdbool.h>

my_bool udf_small_init( UDF_INIT* initid, UDF_ARGS* args, char* message );
void udf_small_deinit( UDF_INIT* initid );
void udf_small_reset( UDF_INIT* initid, UDF_ARGS* args, char* is_null, char *error );
void udf_small_clear( UDF_INIT* initid, UDF_ARGS* args, char* is_null, char *error );
void udf_small_add( UDF_INIT* initid, UDF_ARGS* args, char* is_null, char *error );
double udf_small( UDF_INIT* initid, UDF_ARGS* args, char* is_null, char *error );

struct SortArray
{
    ulonglong length;
    double *data;
};

/*************************************************************************
** Example of init function
** Arguments:
** my_bool maybe_null 1 if function can return NULL
** Default value is 1 if any of the arguments
** is declared maybe_null
**
**
** initid Points to a structure that the init function should fill.
** This argument is given to all other functions.
**
** unsigned int decimals Number of decimals.
** Default value is max decimals in any of the
** arguments.
**
** unsigned int max_length Length of string result.
** The default value for integer functions is 21
** The default value for real functions is 13+
** default number of decimals.
** The default value for string functions is
** the longest string argument.
**
** char *ptr; A pointer that the function can use.
**
**
** args Points to a structure which contains:
** unsigned int arg_count Number of arguments
**
** enum Item_result *arg_type Types for each argument.
** Types are STRING_RESULT, REAL_RESULT
** and INT_RESULT.
**
** char **args Pointer to constant arguments.
** Contains 0 for not constant argument.
**
** unsigned long *lengths; max string length for each argument
** char *maybe_null Information of which arguments
** may be NULL
**
**
** message Error message that should be passed to the user on fail.
** The message buffer is MYSQL_ERRMSG_SIZE big, but one should
** try to keep the error message less than 80 bytes long!
**
** This function should return 1 if something goes wrong. In this case
** message should contain something usefull!
**************************************************************************/
my_bool udf_small_init( UDF_INIT* initid, UDF_ARGS* args, char* message )
{
    struct SortArray *sort = NULL;
    ulonglong target_depth;
    ulonglong i;

    if (args->arg_count != 2)
    {
        strcpy(message, "Wrong number of arguments: SMALL() requires two arguments");
        return 1;
    }
    if ((args->arg_type[0] != REAL_RESULT) || (args->arg_type[1] != INT_RESULT) )
    {
        strcpy(message, "Wrong argument type: SMALL() requires an REAL and a INT");
        return 1;
    }
    if (*((longlong*)args->args[1]) < 1)
    {
        strcpy(message, "Wrong argument value: SMALL() requires second parameter that is positive");
        return 1;
    }

    initid->maybe_null = 0; /* The result may be null */
    initid->decimals = 10; /* We want 10 decimals in the result */
    /* initid->max_length = 10; */ /* 10 digits + . + 10 decimals */

    target_depth = *((longlong*)args->args[1]);

    sort = malloc((sizeof(struct SortArray)));

    if(sort == NULL)
    {
        strcpy(message, "Couldn't allocate memory");
        return 1;
    }   

    initid->ptr = (char*)sort;

    sort->length = 0;   

    sort->data = malloc((target_depth)*(sizeof(double)));

    if(sort->data == NULL)
    {
        strcpy(message, "Couldn't allocate memory");
        return 1;
    }   

    for(i = 0; i < target_depth; i++)
    {
        sort->data[i] = 999999999;
    }

    return 0;
}

/****************************************************************************
** Deinit function. This should free all resources allocated by
** this function.
**
** Arguments:
** initid Return value from xxxx_init
****************************************************************************/
void udf_small_deinit( UDF_INIT* initid )
{
    free(initid->ptr);  
}

/****************************************************************************
** Small Aggregate Function.
**
** There are 3 extra functions for an aggregate function, xxx_reset, 
** xxx_clear, and xxx_add
****************************************************************************/

/****************************************************************************
** xxx_reset
**
** Arguments:
** initid Structure filled by xxx_init
**
** args The same structure as to xxx_init. This structure
** contains values for all parameters.
** Note that the functions MUST check and convert all
** to the type it wants! Null values are represented by
** a NULL pointer
**
** is_null If the result is null, one should store 1 here.
**
** message Error message that should be passed to the user on fail.
** The message buffer is MYSQL_ERRMSG_SIZE big, but one should
** try to keep the error message less than 80 bytes long!.
**
** xxx_reset gets done every time we begin processing the records for a new 
** group. It calls xxx_clear and xxx_add
****************************************************************************/

/* This is only for MySQL 4.0 compability */
void udf_small_reset(UDF_INIT* initid, UDF_ARGS* args, char* is_null, char* message)
{
    udf_small_clear(initid, args, is_null, message);
    udf_small_add(initid, args, is_null, message);
}

/****************************************************************************
** xxx_clear
**
** Arguments:
** initid Structure filled by xxx_init
**
** args The same structure as to xxx_init. This structure
** contains values for all parameters.
** Note that the functions MUST check and convert all
** to the type it wants! Null values are represented by
** a NULL pointer
**
** is_null If the result is null, one should store 1 here.
**
** message Error message that should be passed to the user on fail.
** The message buffer is MYSQL_ERRMSG_SIZE big, but one should
** try to keep the error message less than 80 bytes long!.
**
** xxx_clear resets the processing variables to their initial values for a 
** new group
****************************************************************************/

/* This is needed to get things to work in MySQL 4.1.1 and above */
void udf_small_clear(UDF_INIT* initid, UDF_ARGS* args, char* is_null __attribute__((unused)), char* message __attribute__((unused)))
{
    struct SortArray *sort = NULL;
    ulonglong target_depth;
    ulonglong i;

    sort = (struct SortArray*)initid->ptr;

    target_depth = *((longlong*)args->args[1]);

    /* if(sort == NULL)
    {
        strcpy(message, "Initid Pointer is NULL");
        exit(1);
    }

    if(sort->data == NULL)
    {
        strcpy(message, "Couldn't allocate memory");
        exit(1);
    } */

    sort->length = 0;

    for(i = 0; i < target_depth; i++)
    {
        sort->data[i] = 999999999;
    }
}

/****************************************************************************
** xxx_add
**
** Arguments:
** initid Structure filled by xxx_init
**
** args The same structure as to xxx_init. This structure
** contains values for all parameters.
** Note that the functions MUST check and convert all
** to the type it wants! Null values are represented by
** a NULL pointer
**
** is_null If the result is null, one should store 1 here.
**
** message Error message that should be passed to the user on fail.
** The message buffer is MYSQL_ERRMSG_SIZE big, but one should
** try to keep the error message less than 80 bytes long!.
**
** xxx_add is the main processing workhorse of a anggregate UDF. It processes
** each new record in the group.
****************************************************************************/

void udf_small_add(UDF_INIT* initid, UDF_ARGS* args, char* is_null __attribute__((unused)), char* message __attribute__((unused)))
{
    if (args->args[0] && args->args[1])
    {
        struct SortArray *sort = NULL;
        ulonglong target_depth;
        double col_value;
        ulonglong counter;
        ulonglong i, j;
        bool done;

        sort = (struct SortArray*)initid->ptr;

        target_depth = *((longlong*)args->args[1]); 

        col_value = *((double*)args->args[0]);  

        /* if(sort == NULL)
        {
            strcpy(message, "Initid pointer is NULL");
            exit(1);
        } */

        i = target_depth;
        counter = 1;
        done = FALSE;

        /* sort->data[0] = col_value + 1.0;
        sort->data[1] = col_value;

        col_value = 0.05; */

        while ((!(done)) && (i > 0))
        {
            if(col_value <= sort->data[i-1])
            {
                for(j = 0; j < i-1; j++)
                {
                    sort->data[j] = sort->data[j+1];
                }
                sort->data[i-1] = col_value;
                done = TRUE;
            }
            else
            {
                i--;
                counter++;
            }
        }

        /* if(counter >= sort->length)
        {
            sort->length = counter;
        } */

    }
}

/***************************************************************************
** UDF double function.
**
** Arguments:
** initid Structure filled by xxx_init
**
** args The same structure as to xxx_init. This structure
** contains values for all parameters.
** Note that the functions MUST check and convert all
** to the type it wants! Null values are represented by
** a NULL pointer
**
** is_null If the result is null, one should store 1 here.
**
** error If something goes fatally wrong one should store 1 here.
**
** This function should return a double. It returns a value at the end of 
** each group.
***************************************************************************/

double udf_small( UDF_INIT* initid, UDF_ARGS* args __attribute__((unused)), char* is_null, char* error __attribute__((unused)))
{
    struct SortArray *sort = NULL;
    ulonglong target_depth;

    sort = (struct SortArray*)initid->ptr;
    target_depth = *((longlong*)args->args[1]);

    sort->length = target_depth; /* Kill me once working */

    if(sort != NULL)
    {
        if(sort->length == target_depth)
        {
            *is_null = 0;
            return sort->data[0];
        }   
    }
    else
    {
        *is_null = 1;
        return  10.0;
    }
}