Of course, it's an understandable approach, since couch view functions are required to be idempotent (running the same function with the same input must produce the same output), and at face value, you'd think that generating a random number in the view function would produce different results each time, even with the same input. However, that's not necessarily the case.
Note: none of the following should be used for anything even remotely cryptographic (as that's not the intended or achieved purpose).
The way around this is to make the random-number generation itself idempotent. After all, unless you ask for truely random specifically, 'random' implementations use pseudo-random number generation anyways, so why not? An example of how to accomplish this with a Python view function is:
def outer(): import random #from hashlib import md5 def inner(doc): seed = doc['_id'] #seed = md5(seed).digest() random.seed(seed) yield random.random(), {'_id': doc['_id']} return inner outer = outer()
The use of a hashing function like md5 may improve the distribution of values, though probably not (if it did, that'd say a lot about your PRNG implementation), not to mention it takes more time.
This function gives each document a random position in the view, but that general position will not change over the lifetime of the document because the document id will never change. If the position of a document within the 'random' view needs to change each time the document is updated, we can use the following function:
def outer(): import random #from hashlib import md5 def inner(doc): seed = repr(doc) #seed = md5(seed).digest() random.seed(seed) yield random.random(), {'_id': doc['_id']} return inner outer = outer()
It is common for Couch documents to contain a 'type' value. We can also trivially make views that do randomization within type-groups:
def outer(): import random #from hashlib import md5 def inner(doc): seed = repr(doc) #seed = md5(seed).digest() random.seed(seed) key = [doc.get('type'), random.random()] yield key, {'_id': doc['_id']} return inner outer = outer()
In all of these cases, we're doing what views are meant to do: keeping computable information out of the document. This is especially sensible when you consider that a random value for these purposes really is just junk data. It's a means to an end, but rarely do you actually care about the random value itself. Furthermore, because this method of generating (pseudo) random numbers is idempotent, it's semantically equivalent to recipes that store the random value in the document.
No comments:
Post a Comment