jeudi 25 juin 2015

Dynamic Pattern Matching SQL with counts

In my case I am trying to see if there has been anyone who has devised a way to obtain an output that would traverse through all records for a given column and find all the patterns (whether it returns REGEX or the VALUES), then count how many times that exists. I know a lot of 1 counts would come back, so I more than likely would include a HAVING > 5 for instance.

Table =
dbo.PEOPLE

Columns =
LogID [int] IDENTITY(1,1) NOT NULL
,SourceData [varchar] (50) NOT NULL --which could include values that have numeric, alpha, and characters
,Account [nvarchar] (50) NULL

EXAMPLE DATA SET of table output (pipe delimited)
1|123X-456|56789
2|456/123|A345
3|ABC 345|X567
4|456 HELP-123|YCHB
5|456/987|2345
6|CDE 345|XX987
7|345X-456|YY6767

So what I would like my output to look like is two columns - PATTERN|COUNT
%X-456|2 --pulling in LogID rows 1 and 7, finding the like pattern and omitting everything else replacing with wildcard
456/%|2 --pulling in LogID rows 2 and 5
% 345|2 --pulling in LogID rows 3 and 6

Now, if someone has something that returns REGEX pattern vs. actual pattern values, then I would work with that.

Essentially I figured I would need to traverse through each character of a field, after reading the LEN of the field, and starting with position/char 1 reading each character until the end and writing each character to a TEMP with a cursor/CTE. Then coming back through and seeing how many times I can match that pattern where it is, for instance > 5 as stated at first. I know the execution plan will be robust and consume a lot of my SQL server resources, but it has to be easier than hiring people to find patterns.

Thanks in advance.

Aucun commentaire:

Enregistrer un commentaire