Sql-server – Generate fake data for several tables with relationships

insertrandomrelationssql server

I have 3 tables where I want to generate fake data in UsersCategoryLink.

How can I insert in the table UserCategoryLink the column UserID with random users from the table User and a random id from the table Categories. In this SQL fiddle you can see the tables with some values.

  • UsersCategoryLink must be filled with random users and categories.
  • Every user must have two categories.
  • UserID and CategoryID are primary keys, so every value must be unique.
  • I am using SQL Server Express.

Tables and relations with some information what I want

Best Answer

This is my version of the cross-reference table where you'll store users and the categories they are members of. Notice there is a primary key clustered index on (UserID, CategoryID); this ensures each row is unique:

CREATE TABLE dbo.XREFUserCategories
(
    UserID INT NOT NULL
        CONSTRAINT FK_XREFUserCategories_UserID
        FOREIGN KEY 
        REFERENCES dbo.Users(UserID)
    , CategoryID INT NOT NULL
        CONSTRAINT FK_XREFUserCategories_CategoryID
        FOREIGN KEY 
        REFERENCES dbo.Categories(CategoryID)
    , CONSTRAINT PK_XREFUserCategories
        PRIMARY KEY CLUSTERED (UserID, CategoryID)
);
GO

Since you already have data in the Users and Categories tables, you can use a CTE with the ROW_NUMBER windowing function partitioned by UserID and ordered by an essentially random value, NEWID(). This allows us to pick precisely two "categories" for each "user":

;WITH rs AS 
(
    SELECT u.UserID
        , c.CategoryID
        , rn = ROW_NUMBER() OVER (
            PARTITION BY u.UserID 
            ORDER BY NEWID()
    )
    FROM dbo.Users u
        , dbo.Categories c
)
INSERT INTO dbo.XREFUserCategories(UserID, CategoryID)
SELECT rs.UserID
    , rs.CategoryID
FROM rs
WHERE rs.rn <= 2;

The following is a sample of rows I generated on my system:

enter image description here