There are several tools on the market that help to prepare sample data. One of them is Visual Studio Database Edition, the other is SQL Data Generator from RedGate. I am sure there are many more.
The drawback of the above two excellent products is that they are not free. In some cases this is a major obstacle, because bosses are somehow not as likely to spend money for your tools (and toys) as you would like them to be.
So sometimes you have to implement sample data generator yourself. As it turns out, it is not too complicated (especially for simple cases).
Let’s assume that we want to replace all names in Person.Contact table in AdventureWorks database.
There are 19972 rows in this table in my database. I would like to replace all FirstName, MiddleName and LastName values with values that I prepare. Obviously, it is not so easy to come up with 20000 other names which don’t belong to Klingon language.
But if you think about this, 20000 is 50 x 40 x 10. If you have one fifty first names, forty last names and ten middle names, you can create 20000 unique combinations of these. You can easily find lists of names if you search, but for example this page contains all names that we might need to populate 20000 rows.
For this example, I created three tables, each for separate list of names:
1: create table #RandomLastNames(RandomName nvarchar(50))
2: create table #RandomFirstNames(RandomName nvarchar(50))
3: create table #RandomMiddleNames(RandomName nvarchar(50))
I populated them using the names I found on the page I linked before. For middle names, I just typed in several initials and a NULL value, as NULLs are present in MiddleName column in Person Contact.
1: insert #RandomMiddleNames (RandomName)
2: select N'J.' union all
3: select N'T.' union all
4: select N'R.' union all
5: select N'A.' union all
6: select N'C.' union all
7: select N'M.' union all
8: select N'W.' union all
9: select N'D.' union all
10: select N'S.' union all
11: select N'K.' union all
12: select N'Z.' union all
13: select NULL
To get number of combinations of the data you just select count from cartesian product of all three tables:
1: select count(*) [No of combinations] from
2: #RandomFirstNames cross join
3: #RandomLastNames cross join
4: #RandomMiddleNames
The rest is easy. To preview how data you have will be replaced you run this query:
1: select cn.ContactID, cn.FirstName + N' ' + isnull(cn.MiddleName + N' ', N'') + cn.LastName OriginalData,
2: (select top 1 RandomName from #RandomFirstNames where cn.ContactId = cn.ContactId order by newid()) FirstName,
3: (select top 1 RandomName from #RandomMiddleNames where cn.ContactId = cn.ContactId order by newid()) MiddleName,
4: (select top 1 RandomName from #RandomLastNames where cn.ContactId = cn.ContactId order by newid()) LastName
5: into #DataMapping
6: from Person.Contact cn
To actually update the data just run this update:
1:
2: update Person.Contact set
3: FirstName = b.FirstName,
4: MiddleName = b.MiddleName,
5: LastName = b.LastName
6: from Person.Contact a inner join
7: #DataMapping b on a.ContactId = b.ContactId
And that’s it! You can modify the script I attach to suit your needs, you can modify other data the same way, like emails, address lines etc.
Sample code download here.
No comments:
Post a Comment