Sql-server – What’s a good way to model user authorization to hierarchical data

schemasql server

I'm wanting to implement something that's basically an ACL in SQL Server (2008, we'll be moving to 2012). We're working with farm production information. We will have users who need to be authorized to view production records, but only at certain vertical levels in a hierarchy.

Our data hierarchy looks like this:

System
- Farm
  - Group
    - Animal

The idea is that some users will have access at the System level, and can see records for all Farms, Groups, and Animals within that System. Likewise, some users will have permission starting at the Farm level, and need access only linked to that Farm and all Groups (and animals) within it.

Each table contains a primary key column, and a foreign key column linking it to the parent record (along with whatever other attributes each entity requires).

What I've implemented in the past is two-table system for linking users to the appropriate items they're allowed to see. Implemented here, it would look like this:

Table:  Authorizations          Table:  FullAuthorizations
Columns:    Id (PK)             Columns:    Id (PK)
            UserId                          UserId
            ObjectId                        SystemId
            ObjectType                      FarmId
                                            GroupId
                                            AnimalId

The application inserts a record into Authorizations, with the user to authorize, the record id (System id, Farm id, etc), and the type of record (System, Farm, etc). The FullAuthorizations table is used to denormalize the farm hierarchy for easier/faster filtering of data. A trigger is used on the Authorizations table (and each of the farm, etc, tables) to update FullAuthorizations. I considered using a View here, in a previous project with additional levels of entities, and the performance was quite poor once we began getting several hundred thousand records.

The queries would look something like:

SELECT *
FROM dbo.Animals a
WHERE EXISTS (
    SELECT 1
    FROM dbo.FullAuthorizations fa
    WHERE fa.UserId = 1 AND fa.AnimalId = a.Id
)

In the other project where we're doing this, the solution is performant, but feels like a bit of a hack, and I especially don't like that we can't maintain referential integrity on Authorizations with the associated objects. I'd appreciate feedback on some other possible solutions. I've been looking at things like Nested Sets, but not sure something like that fits this particular problem.

Best Answer

A long time ago, in 2000, I had a similar question, and posted on Google Groups. I received an awesome response from a guy called Joe Celko, and it will fit right into your problem. I've been using this methodology ever since, and it has great performance, and works really well

https://groups.google.com/forum/#!search/pascaljr/microsoft.public.sqlserver.programming/knQ5WXIDfeo/E0vfeC2kbTsJ

It uses two integer values, a left and a right value, to store the hierarchy. I've made some adaption to the table and procs over the years, but the essence remains the same.

Related Solutions

Sql-server – Indexing – Uniqueidentifier Foreign Key or Intermediary mapping table

Ok, I am making a lot of assumptions (INT instead of VARCHAR(50) being one of them) with this answer, so feel free to correct me if needed. The problem with option B is that it introduces a new join to relate Users to Alerts without any real added benefit. If joining on the UserID, it is best to index the UserID, so you can utilize seeks for your joins.

For Option A, UserID will be the clustering key (index key for the clustered index) on the Users table. UserID will be a nonclustered index key on Alerts table. This will cost 16 bytes per Alert.

For Option B, UserID will be the clustering key on the Users table. UserId will probably be the clustering key in UserMap too, to make joining more efficient. UserKey (assuming this is an INT) would then be a nonclustered index key on the Alerts table. This will cost 4 bytes per Alert. And 20 bytes per UserMap.

Looking at the big picture, one relationship, for Option A, costs 16 bytes of storage, and involves 1 join operation. Whereas, one relationship, for Option B, costs 24 bytes of storage, and involves 2 join operations.

Furthermore, there are a possibility of 340,282,366,920,938,000,000,000,000,000,000,000,000 uniqueidentifiers and only 4,294,967,296 INTs. Implementing a uniqueidentifier to INT map for a this type of relationship could cause unexpected results when you start reusing INTs.

The only reason for creating this type map table, is if you plan on creating a Many to Many relationship between Users and Alerts.

Taking all of this into consideration, I would recommend Option A.

I hope this helps,

Matt

Sql-server – Suggestion for storing a lot of simple data

To reduce space requirements you might consider a few things:

Not bothering to store http:// or the leading www. - this is just wasted space (though in a few cases the www. is required because people don't know how to configure their web sites properly).
Making sure you use data compression. Most systems are still I/O-bound, not CPU-bound.

Only storing any domain name once, and storing the page URLs separately. Both may actually repeat and storing them multiple times is wasteful. So for example instead of:

CREATE TABLE dbo.Pages
(
  ID BIGINT IDENTITY(1,1) PRIMARY KEY, -- need BIGINT for 500 billion rows
  URL VARCHAR(2048) -- far too large to apply UNIQUE
);

CREATE TABLE dbo.PageLinks
(
  PageID BIGINT NOT NULL FOREIGN KEY REFERENCES dbo.Pages(ID),
  LinkID BIGINT NOT NULL FOREIGN KEY REFERENCES dbo.Pages(ID),
  PRIMARY KEY (PageID, LinkID)
);

You could do something like this:

CREATE TABLE dbo.Domains
(
  DomainID INT IDENTITY(1,1) PRIMARY KEY, -- probably no more than 2BN domains
  DomainName VARCHAR(255) NOT NULL
) WITH (DATA_COMPRESSION = PAGE);

CREATE UNIQUE INDEX dn ON dbo.Domains(DomainName)
  WITH (DATA_COMPRESSION = PAGE);

CREATE TABLE dbo.URLs
(
  URLID INT IDENTITY(1,1) PRIMARY KEY, -- maybe you need BIGINT here
  URL VARCHAR(2048) NOT NULL -- still can't apply UNIQUE here
  -- but you can have the same URL (e.g. /page.php) from two different 
  -- domains only listed once.
);

CREATE TABLE dbo.DomainURLs
(
  DomainURLID INT IDENTITY(1,1) PRIMARY KEY, -- may also need BIGINT here
  DomainID INT NOT NULL FOREIGN KEY REFERENCES dbo.Domains(DomainID),
  URLID INT NOT NULL FOREIGN KEY REFERENCES dbo.URLs(URLID)
) WITH (DATA_COMPRESSION = PAGE);

CREATE UNIQUE INDEX du ON dbo.DomainURLs(DomainID, URLID)
  WITH (DATA_COMPRESSION = PAGE);

Yes, table design and query semantics will be much more complex, but it will scale much better once you index many pages on the same site (or many pages with the same URL across different sites).

Just to demonstrate the potential for space savings here. Taking just the storage of the URLs and ignoring the links between them, let's look at just the Pages, Domains and URLs tables (and to be fair, I'll even test your pages table with compression).

Note that we take advantage of the fact that you will probably index each site in turn, alphabetically, rather than hash and index random URLs each iteration. This allows compression to work as well as possible in this case.

For some sample data, on my system this generates about 121,000 rows, but that will vary depending on a lot of factors, such as how many databases are on your system, how many objects in your database, the design of any non-system objects (e.g. number of columns), and even @@VERSION:

;WITH x AS 
(
  SELECT 
    d = t.name + CONVERT(VARCHAR(5), d.database_id) + '.com', 
    p = '/' + c.name + '.php'
   FROM sys.all_objects AS t
   CROSS JOIN sys.databases AS d
   INNER JOIN sys.all_columns AS c
   ON t.[object_id] = c.[object_id]
 )
 SELECT d, p 
 FROM x
 ORDER BY d, p;

Sample results:

all_columns1.com    /collation_name.php
all_columns1.com    /column_id.php
all_columns1.com    /default_object_id.php
all_columns1.com    /is_ansi_padded.php
all_columns1.com    /is_column_set.php
...

Now let's use that to populate our four tables:

;WITH x AS 
(
  SELECT 
    d = t.name + CONVERT(VARCHAR(5), d.database_id) + '.com', 
    p = '/' + c.name + '.php'
   FROM sys.all_objects AS t
   CROSS JOIN sys.databases AS d
   INNER JOIN sys.all_columns AS c
   ON t.[object_id] = c.[object_id]
 )
 SELECT d, p 
 INTO #blat
 FROM x
 ORDER BY d, p;

 CREATE CLUSTERED INDEX x ON #blat(d, p);

 INSERT dbo.Pages(URL) SELECT 'http://www.' + d + p FROM #blat ORDER BY d, p;

 INSERT dbo.Pages_compressed(URL) SELECT 'http://www.' + d + p FROM #blat ORDER BY d, p;

 INSERT dbo.Domains(DomainName) SELECT DISTINCT d FROM #blat ORDER BY d;

 INSERT dbo.URLs(URL) SELECT DISTINCT p FROM #blat;

Now of course you would have to build the junction table here by cross-referencing the Domains and the URLs. But this will still show how much space you can save by not storing the same domain name or page name twice, even if it means more complicated logic to put these things together:

EXEC sp_spaceused 'dbo.Pages';            -- 8,904 KB
EXEC sp_spaceused 'dbo.Pages_compressed'; -- 4,552 KB
EXEC sp_spaceused 'dbo.Domains';          --   656 KB
EXEC sp_spaceused 'dbo.URLs';             --   136 KB

That's with 121,000 URLs. For 500 billion? Let's extrapolate. Per complete URL, without compression, you are storing about 73 bytes. With compression, 38 bytes per URL. My method: 6.5 bytes per URL. (Discard, for a moment, how unrealistic my sample data is. Let's pretend I've come somewhere close to the average URL length, and that both (a) you will be indexing many URLs on every domain and (b) you will get duplication of page paths across many domains.)

So ignoring those realities, simple math shows more than 90% reduction in storage space between your uncompressed method and my method:

Your method, uncompressed = 33.2 TB
Your method, compressed   = 17.1 TB
My method                 =  2.9 TB

And again, it's more complicated, so more work up front, but usually this is worth it. You only design the schema and write the code around it once; you'll be maintaining this for, well, how long do you expect the service to exist?

Best Answer

Related Solutions

Sql-server – Indexing – Uniqueidentifier Foreign Key or Intermediary mapping table

Sql-server – Suggestion for storing a lot of simple data

Related Question