I need to generate a resource domain prefix based on a given path and configured number of resource domains in a deterministic fashion with good distribution. For example, if you pass it the path "/script/site.js" it returns "res#", where '#' is an integer between 0 and the configured amount of resource domains.
Using C# 3.0.
So far have this:
var resourceDomainCount = 4;
var hasher = System.Security.Cryptography.SHA1.Create();
var paths = new [] {
"/App_Themes.0.1.433.232/images/buttonsBackgrounds.jpg",
"/App_Themes.0.1.433.232/images/blahblah.jpg",
"/App_Themes.0.1.433.232/images/pagebg.gif",
"/App_Themes.0.1.433.232/site.css",
"/script/site.js",
"/App_Themes.0.1.433.232/images/different.jpg",
"/App_Themes.0.1.433.232/images/shadows.jpg",
"/Handlers/ImageHandler.ashx?type=l&id=123&s=g&index=0",
"/Handlers/ImageHandler.ashx?type=l&id=234&s=g&index=0",
"/Handlers/ImageHandler.ashx?type=l&id=345&s=g&index=0",
"/Handlers/ImageHandler.ashx?type=p&id=MyProduct&s=g&index=0",
"/Handlers/ImageHandler.ashx?type=p&id=WineGreat&s=g&index=0",
"/Handlers/ImageHandler.ashx?type=p&id=YayYay&s=g&index=0"
};
foreach (var path in paths) {
var pathHash = hasher.ComputeHash(Encoding.ASCII.GetBytes(path));
var singleByteHash = pathHash.Aggregate(0, (a, b) => a ^ b);
var random = new Random((int)singleByteHash);
var resourceDomainIndex = random.Next(0, resourceDomainCount);
(resourceDomainIndex + ": " + path).Dump();
}
Which produces the following:
3: /App_Themes.0.1.433.232/images/buttonsBackgrounds.jpg
0: /App_Themes.0.1.433.232/images/blahblah.jpg
1: /App_Themes.0.1.433.232/images/pagebg.gif
1: /App_Themes.0.1.433.232/site.css
3: /script/site.js
1: /App_Themes.0.1.433.232/images/different.jpg
3: /App_Themes.0.1.433.232/images/shadows.jpg
1: /Handlers/ImageHandler.ashx?type=l&id=123&s=g&index=0
1: /Handlers/ImageHandler.ashx?type=l&id=234&s=g&index=0
0: /Handlers/ImageHandler.ashx?type=l&id=345&s=g&index=0
2: /Handlers/ImageHandler.ashx?type=p&id=MyProduct&s=g&index=0
1: /Handlers/ImageHandler.ashx?type=p&id=WineGreat&s=g&index=0
0: /Handlers/ImageHandler.ashx?type=p&id=YayYay&s=g&index=0
Not getting the distribution I want (there is only one instance of '2' there).
There are thousands of input strings that change all the time and the paths will be generated at runtime, e.g.: <a href="<%= GetPath("/script/site.js") %>">Link</a>
-
Your sample is far, far too small. Randomness is generally like statistics. It's only borne out over a long period of time. Try rerunning your code over a 100 iterations, and see if each value comes up about the same amount of time. Then try again for 10,000 iteratons and notice that the actual number of results are closer percentage-wise. Then run it for 1 million iterations, and notice that it's even closer. It never needs to be exactly the same - otherwise it wouldn't really be random.
And if the results don't follow that pattern, then maybe you've got a problem with teh even distribution issue.
Damian Edwards : I understand what you're saying, but I'd like to find an algorithm that provides good distribution even over small sets. Imaging a page that shows 25 search results, each with their own image. I'd like good distribution between 4 resource domains for those 25 image paths.atk : If your intent is to guarantee equal distribution over small sets, why not just use a round robin? Or populate all the data into a set, then extract & return a random element from that set (and whenever the set is empty, refill it with each possible element)? -
This is a lot simpler than hashing the string (well, presumably it's already a hash anyway) and seems to get a better distribution:
Math.Abs(path.GetHashCode()) % resourceDomainCountDamian Edwards : This works well, even for a small set. Cheers.Damian Edwards : Btw, substituting that into the original snippet gets this result: 0,2,1,2,3,1,3,0,1,2,1,2,2
0 comments:
Post a Comment