Postgresql – PL/pgSQL issues when function used twice (caching problem ?)

functionsplan-cacheplpgsqlpostgresqlpostgresql-9.4

I am facing an absolutely weird problem that feels much like a Postgres bug than an algorithm problem.

I have this function:

CREATE FUNCTION sp_connect(mail character varying, passwd character varying, role character varying)
  RETURNS json LANGUAGE plpgsql STABLE AS
$$
DECLARE
    user_info record;
BEGIN
  IF role = 'Role1' THEN
    SELECT u.id, r.name INTO user_info
    FROM users u
    INNER JOIN users_roles ur ON ur.user_id = u.id
    INNER JOIN roles r ON ur.role_id = r.id
    WHERE u.email = mail
      AND u.password = encode(digest(CONCAT(passwd, u.password_salt), 'sha512'), 'hex')
      AND r.name = 'Role1';
  ELSIF role = 'Role2' THEN
    SELECT h.id, 'Role1' AS name INTO user_info
    FROM history h
    WHERE h.email = mail
      AND h.password = encode(digest(CONCAT(passwd, h.password_salt), 'sha512'), 'hex');
  ELSE
    RAISE 'USER_NOT_FOUND';
  END IF;

  IF NOT FOUND THEN
      RAISE 'USER_NOT_FOUND';
  ELSE
      RETURN row_to_json(row) FROM (SELECT user_info.id AS id, user_info.name AS role) row;
  END IF;
END;
$$;

The problem I'm facing is when I use this function to log in with a Role1-user, then when I use it with a Role2-user, I get this error message:

type of parameter 7 (character varying) does not match that when preparing the plan (unknown)

Which is… well, I just don't understand where does it come from. If you wipe the database and change the login order (i.e. Role2 then Role1), this time, Role1 gets the error.

Strange issue, strange solutions… If I just use ALTER FUNCTION sp_connect but without modify anything inside the function, then magically, the two roles can login without any problem. I also tried this solution:

  IF NOT FOUND THEN
      RAISE 'USER_NOT_FOUND';
  ELSE
      IF role = 'Seeker'
      THEN
          RETURN row_to_json(row) FROM (SELECT user_info.id AS id, user_info.name AS role) row;
      ELSE
          RETURN row_to_json(row) FROM (SELECT user_info.id AS id, user_info.name AS role) row;
  END IF;

And by adding an IF and ELSE that is absolutely useless and use the same RETURN-clause, this does not trigger any error.

I know DBA StackExchange is not for developers but this kind of problem seems to be more like a caching problem or whatever. Can somebody can tell me if I am doing something wrong with PostgreSQL functions? Or where I may get help with this weird problem?

Best Answer

Explanation:

You declare user_info as record. The manual:

Record variables are similar to row-type variables, but they have no predefined structure. They take on the actual row structure of the row they are assigned during a SELECT or FOR command. The substructure of a record variable can change each time it is assigned to.

Bold emphasis mine.

You assign the record user_info and then derive a row type (actually called row in your code) from it in the RETURN statement. This is all fine and dandy, until you assign columns of different data types to the record in the next call within the same session. This is incompatible with the cached query plan of the prepared statement and raises an exception.

PL/pgSQL treats all SQL statements in the function body as prepared statements. The query plan for prepared statements is cached for the duration of the session unless any involved objects are changed (including the function itself), which deallocates all depending prepared statements. This explains why the next invocation after ALTER FUNCTION always worked. More explanation:

Solution

There are various ways around this. You found some yourself already. Just avoid feeding parameters of different data type to the same (prepared) statement

The simple solution would be to cast to the same data type. You didn't provide table definitions, I assume users.id and history.id are integer and match just fine. Judging from the error message I assume roles.name is varchar, so I cast the string literal 'Role1' to varchar as well and everything should work. (You might actually mean 'Role2', but that's orthogonal to the problem.)

An untyped string literal is coerced to a matching data type in some types of SQL statements where a data type can be derived from the context. But that's not the case here. Without explicit cast the string literal is type unknown, which is not the same as varchar (or text). This also shows in your error message.

CREATE FUNCTION sp_connect(mail varchar, passwd varchar, role varchar)
  RETURNS json AS
$func$
DECLARE
   user_info record;
BEGIN
  IF role = 'Role1' THEN
    SELECT u.id, r.name INTO user_info
    FROM   users u
    JOIN   users_roles ur ON ur.user_id = u.id
    JOIN   roles r ON ur.role_id = r.id
    WHERE  u.email = mail
    AND    u.password = encode(digest(CONCAT(passwd, u.password_salt), 'sha512'), 'hex')
    AND    r.name = 'Role1';

    RAISE NOTICE 'Role1: user_info.big_id: %; user_info.name: %'
                , pg_typeof(user_info.big_id), pg_typeof(user_info.name);  -- see data types
  ELSIF role = 'Role2' THEN
    SELECT h.id, 'Role1'::varchar AS name INTO user_info  -- Cast! And did you mean 'Role2'?
    FROM   history h
    WHERE  h.email = mail
    AND    h.password = encode(digest(CONCAT(passwd, h.password_salt), 'sha512'), 'hex');

    RAISE NOTICE 'Role2: %: user_info.big_id: %; user_info.name: %'
                , pg_typeof(user_info.big_id), pg_typeof(user_info.name);  -- see data types
  END IF;

  IF NOT FOUND THEN
      RAISE 'USER_NOT_FOUND';
  ELSE
      RETURN row_to_json(row) FROM (SELECT user_info.id AS id, user_info.name AS role) row;
  END IF;
END
$func$  LANGUAGE plpgsql STABLE;

The function can be STABLE, that's the correct volatility.

I also added RAISE NOTICE statements to show data types, which should help you debug. Note that the plan for the RAISE statement itself is cached as well, so a single RAISE after END IF would be subject to the same problem.