I followed this procedure to check to see if a table lock is preventing a table from being deleted, but finding none. However, I am still unable to drop a table in Redshift. Are there other system tables I can check to see if something is locking a table and preventing the drop?
Redshift DROP and TRUNCATE Hanging with No Locks – Solution
redshift
Related Solutions
I think your understanding of query queues is a little off.
A queue is like a thread in Java. A query arrives and is designated to the "less loaded" queue, and it waits for its turn to be resolved.
The decision of where to put a query is independent of how busy a queue is; queries are allocated based on the rules you've set up: http://docs.aws.amazon.com/redshift/latest/dg/cm-c-wlm-queue-assignment-rules.html
We can have as many queues as we like.
Not quite, check out http://docs.aws.amazon.com/redshift/latest/dg/cm-c-defining-query-queues.html. The relevant line is
The maximum total concurrency level for all user-defined queues, not including the reserved Superuser queue, is 50.
So the concurrency is definitely limited, but that's a reasonable restriction because each concurrency slot reserves some of the cluster's resources.
A queue has some memory allocated (we guess divided equally?)
By default memory is split equally among queues, but you can assign memory at the granularity of 1% of cluster memory. See the above link about defining query queues.
In a queue we can assign user groups or queries groups. But in short-term, that's a lot of classification work in our queries we can't do right now.
Creating user groups and setting query groups is actually pretty simple, see http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_GROUP.html and http://docs.aws.amazon.com/redshift/latest/dg/r_query_group.html.
If you didn't set up user groups or query groups, then my guess is that you didn't see any improvement after adding the extra queues because all the queries still ran in a single query queue. (In fact, the extra queues would simply take resources from the queue running your queries.) Since all your queries are running from the same user and for the same purpose, it makes sense that they should all resolve to the same query queue. A better solution might be to have only a single queue, but increase the concurrency level. (Keep in mind that a query queue's resources are divided equally among all concurrency slots, even if they're not being used, so a concurrency level of 50 means no query gets more than 2% of the total resources.)
All that being said, redshift is a higher-latency system than many other database solutions. If you're only looking to run lots of small queries, then it might not be the best fit for your problem.
You are definately on the right track with Kimball rather than inmon for Redshift.
There are a number of patterns for this, I have used them all in different use cases
- "ELT" pattern - Load the source tables to redshift fully, do not do any significant transformations until the data has been loaded. For this you can either load to s3, then use redshift copy command or I would recommend using "AWS data migration services", which can sync a source (e.g.mysql or postgres) to a target (e.g. redshift) Then, on a regular basis run sql processes within redshift to populate dims then facts. You can use third part cloud based tools to "simplify" this process if you want to - such as Matillion (i do not recommend using a third party tool)
- "ETL pattern" - Transform the data in flight, using apache spark. and load the dims and facts into redshift spark->s3->redshift. I have used EMR for this which is good. this is also the approach taken if you use AWS Glue
- Do not transform ! - similar to 1) but just use the tables that have been loaded.
Note that Redshift sometimes works BETTER if you have a wide table with repeated values rather than a fact and dimensions. The reason for this is that the columnar approach lets Redshift compress the different values down to a level that is pretty efficient. I do not have a formula for when to use many Dimensions vs a flat wide table, the only way is to try it and see!
Some links
Best Answer
Please try this code to identify locks over past 7 days.
This is from the link here lock_wait.sql