Announcing an updated drawing protocol for the EffectiveAltruism.org donor lotteries

post by SamDeere · 2019-01-24T22:22:45.456Z · score: 7 (8 votes) · EA · GW · 13 comments

Contents

  Methodology
  Worked Example
  Reference implementation
    Further info
None
13 comments

Update 28 Jan 2019: We've reverted to using the original NIST Beacon protocol for the draw.

It has come to our attention that the public source of randomness used to draw the EffectiveAltruism.org donor lottery — the NIST Randomness Beacon — is not currently operating due to the ongoing US government shutdown. This means our regular method of drawing winning numbers is unavailable.

Methodology

There have been several alternative methods suggested for a good public source of randomness. After careful deliberation, we have chosen to use the Incorporated Research Institutions for Seismology (IRIS) list of earthquakes as a randomness source. The data includes a range of numbers that will be impossible to predict in advance, including the latitude, longitude, magnitude, and depth of the earthquake.

We will choose the first earthquake appearing on this list of large earthquakes with a timestamp immediately after the lottery draw date.

Specifically, we will:

We will do the drawing at least 24 hours after the draw date to ensure that IRIS has had time to process incoming data.

There are two lotteries up for drawing ($100k and $500k), currently with draw times staggered by five minutes. In order to ensure there are two separate drawings, we will reset their draw timestamps to be identical, and then use the first two earthquake events appearing on the list after that timestamp as follows:

Worked Example

Let’s assume that the donor lotteries in question closed at midnight on January 23, 2019. The two earthquake events immediately after this timestamp are 10998501 and 10998539 respectively.

Reference implementation

The following bash script illustrates the process for generating the hashes:

#!/bin/bash  

#  Bash script for calculating the winning lottery number using earthquake  
#  data from IRIS (Incorporated Research Institutions for Seismology).  
#  
#  Usage:  
#         ./draw_lottery_iris.sh iris_id  
#   e.g.  ./draw_lottery_iris.sh 10998811  
#  
#  The script works as follows:  
#  - get the request from the IRIS server for the relevant event  
#  - trim newlines from the response  
#  - strip the response to just the numeric digits  
#  - get the SHA256 hash of the digit string in binary  
#  - cast the binary hash to a hexdump (only keeping the first line)  
#  - truncate the hash to its first 10 characters  

EVENT_ID=$1  

curl -s "http://service.iris.edu/fdsnws/event/1/query?eventid=$EVENT_ID&format=text" \  
 | tr -d '\n' \  
 | awk '{gsub(/[^0-9]/,"")}1' \  
 | openssl dgst -sha256 -binary \  
 | xxd -p | head -n 1 \  
 | cut -c -10

Further info

To avoid any more late-breaking changes, we'll commit to using this protocol even in the event that the NIST beacon comes back online before the lottery draw date.

EDIT 28 Jan 2019: The NIST beacon has come back online. Notwithstanding the statement that we would stick with the above protocol, we've decided that it would be best to revert to the NIST Beacon as our source of randomness. That is, the drawing procedure will be as it originally was (as described in the donor lottery's methodology section).

If you have any more questions, please comment below.

Crossposted from the Centre For Effective Altruism blog

13 comments

Comments sorted by top scores.

comment by SamDeere · 2019-01-28T23:49:30.452Z · score: 7 (2 votes) · EA · GW

The NIST Beacon is back online. After consulting a number of people (and notwithstanding that we previously committed to not changing back), we've decided that it would in fact be better to revert to using the NIST beacon. I've edited the post text to reflect this, and emailed all lottery participants.

comment by vipulnaik · 2019-01-28T05:32:19.194Z · score: 2 (2 votes) · EA · GW

It looks like the NIST randomness beacon will be back in time for the draw date of the lottery. https://www.nist.gov/programs-projects/nist-randomness-beacon says "NIST will reopen at 6:00 AM on Monday, January 28, 2019."

Might it make sense to return to the NIST randomness beacon for the drawing?

comment by rafa_fanboy · 2019-01-25T13:49:37.579Z · score: 1 (3 votes) · EA · GW

random.org ?

comment by SamDeere · 2019-01-25T18:07:11.346Z · score: 3 (3 votes) · EA · GW

AFAIK random.org offers to run lotteries for you (for a fee), but all participants still need to trust them to generate the numbers fairly. It's obviously unlikely that there would in fact be any problem here, but we're erring on the side of having something that's easier for an external party to inspect.

comment by Paul_Christiano · 2019-01-25T18:20:31.614Z · score: 4 (3 votes) · EA · GW

Trusting random.org doesn't seem so bad (probably a bit better than trusting IRIS, since IRIS isn't in the business of claiming to be non-manipulable). I don't know if they support arbitrary winning probabilities for draws, but probably there is some way to make it work.

(That does seem strictly worse than hashing powerball numbers though, which seem more trustworthy than random.org and easier to get.)

comment by beth​ · 2019-01-25T07:52:02.893Z · score: 0 (2 votes) · EA · GW

I'd like to see some justification for using this approach over the myriad of more responsible ways of generating random draws.

comment by SamDeere · 2019-01-25T18:03:41.495Z · score: 6 (4 votes) · EA · GW

The draw should to have the following properties:

  • The source of randomness needs to be generated independently from both CEA and all possible entrants
  • The resulting random number needs to be published publicly
  • The randomness needs to be generated at a specific, precommitted time in the future
  • The method for arriving at the final number should ideally be open to public inspection

This is because, if we generated the number ourselves, or used a private third-party, there's no good guarantees against collusion. Entrants in the lottery could reasonably say 'how do I know that the draw is fair?', especially as the prize pool is large enough that it could incentivise cheating. The future precommitment is important because it guarantees that we can't secretly know the number, and the specific timing is important because it means that we can't just keep waiting for numbers to be generated until we see one that we like the look of.

The method proposed above means that anyone can see how we arrived at the final random number, because it takes a public number that we can't possibly influence, and then hashes it using SHA256, which is well-verified, deterministic (i.e. anyone can run it on their own computer and check our working) and distributes the possible answers uniformly (so everyone has an equal chance of winning).

Typical lottery drawings have these properties too: live broadcast, studio audience (i.e. they are publicly verifiable), balls being mixed and then picked out of a machine (i.e. an easy-to-inspect, uniformly-distributed source of randomness that, because it is public, cannot be gamed by the people running the lottery).

Earthquakes have the nice property that their incidence follows a rough power law distribution (so you know approximately how regularly they'll happen), but the specifics of the location, magnitude, depth or any other properties of any given future earthquake are entirely unpredictable. This means that we know that there will be a set of unpredictable (i.e. random) numbers generated by seismometers, but we (and anyone trying to game the lottery) have no way of knowing what they will be in advance.

(This is not actually that different to how your computer generates randomness — it uses small unpredictable events, like the very precise time between keystrokes, or tiny changes in mouse direction, to generate the entropy pool for generating random numbers locally. We're just using the same technique, but allowing people to see into the entropy pool).

Other plausible sources of randomness we considered included the block hash of the first block mined after the draw date on the Bitcoin blockchain, and the numbers of a particular Powerball drawing.

comment by Paul_Christiano · 2019-01-25T18:01:53.688Z · score: 2 (1 votes) · EA · GW

I'm not sure what the myriad of more responsible ways are. If you trust CEA to not mess with the lottery more than you trust IRIS not to change their earthquake reports to mess with the lottery, then just having CEA pick numbers out of a hat could be better.

It definitely seems like free-riding on some other public lottery drawing that people already trust might be better.

comment by richard_ngo · 2019-01-25T11:10:37.974Z · score: 2 (2 votes) · EA · GW

Can you give some examples of "more responsible" ways?

I agree that in general calculating your own random digits feels a lot like rolling your own crypto. (Edit: I misunderstood the method and thought there was an easy exploit, which I was wrong about. Nevertheless at least 1/3 of the digits in the API response are predictable, maybe more, and the whole thing is quite small, so it might be possible to increase your probability of winning slightly by brute force calculating possibilities, assuming you get to pick your own contiguous ticket number range. My preliminary calculations suggest that this method would be too difficult, but I'm not an expert, there may be more sophisticated hacks).

comment by beth​ · 2019-01-26T12:36:55.100Z · score: 5 (4 votes) · EA · GW

My troubles with this method are two-fold.

1. SHA256 is a hashing-algorithm. Its security is well-vetted for certain kinds of applications and certain kinds of attacks, but "randomly distribute the first 10 hex-digits" is not one of those applications. The post does not include so much as a graph of the distribution of what the past drawing results would have been with this method, so CEA hasn't really justified why the result would be uniformly distributed.

2. The least-significant digits in the IRIS data are probably fungible by adversaries. It is hard to check them, and IRIS has no reason to secure their data pipeline against attacks that might cost tens of thousands of dollars, because there are normally no stakes whatsoever attached to those bits.

Random.org is exactly in the business that we're looking for, so they'd be a good option for their own institutional guarantee. Otherwise, any big lottery in any country will work as a source of randomness: the prizes there are bigger, which means that, even if these lotteries could be corrupted, nobody would waste that ability on rigging the donor lottery.

comment by SamDeere · 2019-01-29T00:30:30.310Z · score: 2 (2 votes) · EA · GW

Re 1, this is less of a worry to me. You're right that this isn't something that SHA256 has been specifically vetted for, but my understanding is that the SHA-2 family of algorithms should have uniformly-distributed outputs. In fact, the NIST beacon values are all just SHA-512 hashes (of a random seed plus the previous beacon's value and some other info), so this method vs the NIST method shouldn't have different properties (although, as you note, we didn't do a specific analysis of this particular set of inputs — noted, and mea culpa).

However, the point re 2 is definitely a fair concern, and I think that this is the biggest defeater here. As such, (and given the NIST Beacon is back online) we're reverting to the original NIST method.

Thanks for raising the concerns.

ETA: On further reflection, you're right that it's problematic knowing whether the first 10 hex digits will be uniformly distributed given that we don't have a full-entropy source (which is a significant difference between this method and the NIST beacon — we just made sure that the method had greater entropy than the 40 bits we needed to cover all the possible ticket values). So, your point about testing sample values in advance is well-made.

comment by Paul_Christiano · 2019-01-25T17:54:59.160Z · score: 3 (2 votes) · EA · GW

There is plenty of entropy in the API responses, that's not the worst concern.

I think the most serious question is whether a participant can influence the lottery draw (e.g. by getting IRIS to change low order digits of the reported latitude or longitude).

comment by SamDeere · 2019-01-25T17:11:38.170Z · score: 3 (5 votes) · EA · GW

Agree with the sentiment, but we're most definitely not rolling our own crypto. The method above relies on the public and extremely-widely-vetted SHA256 algorithm. This algorithm has the nice property that even slightly different inputs produce wildly different outputs. Secondly, it should distribute these outputs uniformly across the entire possibility space. This means that it would be useless to bruteforce the prediction, because each of your candidates would have an even chance of ending up basically anywhere.

For example, compare the input strings 1111111111111111111111111111 and 1111111111111111111111111112 with their SHA256 outputs:

sha256(1111111111111111111111111111)
  = fe16863cfd4015c58da63aa5d2fe80e6e1fcd0bbdd57296fe28844cc7d79581b


sha256(1111111111111111111111111112)
  = b74822540995e7aa1b50a4d9d23a4b13aff99910c3c2111b9bf649e947e5f49c

It doesn't matter how much of the API response remains the same (for example, we could pad the input of every hash we generated with the same fixed string and have the same randomness properties as the proposal above). All that matters is that each response is going to be (unpredictably) different from the next.

ETA: It's perhaps more helpful to see the digits from the API response as a publicly verifiable seed to a pseudorandom number generator, rather than as the random number itself.