A SoapBlox Politics Blog
[Mobile Edition]
About
- About Us
- Email Us (news/tips)
- Editorial Policy
- Posting Guidelines
- Advertise Here
Feedburner

Subscribe to Michlib daily email summary. (Preview)
Enter address:

Donate
Become a sponsor and support our work.

 MichLib sponsor list

Michigan Political Blog Ad Network

Advertise Liberally

50 State Ad Network

Q: How many votes change do recounts change? A: About 5 per 1000.

by: Grebner

Tue Feb 18, 2014 at 17:31:21 PM EST


I’ve been thinking about building a statistical model of recounts, to make it possible to estimate probabilities that the outcome of an election would be reversed under various conditions.  It’s impossible to do a rigorous test, since it’s not clear what statistical model might be appropriate, and we haven’t have adequate data for a rigorous test anyway.


Now, an organization called the Michigan Election Reform Alliance has painstakingly conducted an unofficial recount of the ballots cast in two elections in Allegan County, as part of their larger examination of voting in Michigan.  Using Michigan’s FOIA, they obtained access to the ballots voted in the August 2008 and November 2012 elections, and they tallied some 135,000 individual votes, by hand,  and compared their totals to the official tallies.

MERA is interested in the big picture: reforming the entire election process, which is a wonderful idea.  But as a small-picture guy, I borrowed their data to try to estimate the probability that under various conditions a recount would reverse the initially announced results of a very close race.

Here’s the bottom line:  when the ballots are re-checked, there are about five random changes per thousand ballots.  Since some of the errors being corrected actually cancelled each other, the likely net change can be estimated to be: SQRT(ballots/200).

In addition, as large numbers of ballots are counted, the Democratic candidate in any given two-party contest tends to gain about 0.2 votes for each 1000.  In small recounts (say, fewer than 100,000 ballots) the random effect is the only thing that counts.  In larger recounts (congressional or statewide) the tendency for a recount to benefit the Democrat becomes more important while the random effects tend to cancel out.

Before the recount begins in earnest, the election officials look for errors that involved mishandling of groups of ballots, which might have been counted twice, or not counted at all.  Or numerical errors such as mistakes in copying numbers or adding them together.  I don’t know of any data set which allows such such arithmetic errors to be modelled - they might be large or small, and in small districts they probably won’t appear.  These “bulk errors” need to be corrected and included in the tallies before we talk about the effects of a recount.

Once we have solid totals for the two candidates, to estimate the impact of the random effect, to estimate the net number of ballots which will be shifted from one candidate to the other, we divide the number of ballots by 200 and take the square root.  (This corresponds to “the standard error”.)  If we double that number, we get a reasonable idea of the largest likely change.  (I.e.: a 95% confidence interval.)

Second, in a partisan general election, a careful recount is likely to increase the Democratic candidate’s share by about 20 votes per 100,000 cast, mainly resulting from the scanning machine having overlooked ballots marked by voters who failed to follow instructions very well, perhaps using the wrong pencil, or failing to fill in the area completely.  In other cases, ballots are “rehabilitated” which had been disqualified because they apparently showed too many votes for a given office, where the extra “vote” turned out to be a smudge or crease mark.

Let’s use a specific example:  the 2000 Congressional race in CD8, where the initial results showed Mike Rogers getting 145179 votes, to Dianne Byrum’s 145019, a difference of 160 votes.  If the ballots had been cast using the optical scan system that is currently in use in Michigan (the election was actually conducted using punchcards) this model says that the largest Byrum gain to be expected would have been

Random effect:  2 * SQRT(145179+145019) * 0.05 = 54 votes

Specific Democratic effect: (145179+145019) * 0.00020 = 58 votes.


In the actual recount, Byrum’s position improved by roughly 50 votes, leaving her 111 short.

Because that election was held using punch-cards, this is an illustration, rather than a literal analysis.  But if the same election occurred again, my analysis suggests that gap is probably too large to overcome merely by correcting random individual-ballot errors - the result would probably not be overturned unless the margin were narrowed by the discovery of a bulk error.


Grebner :: Q: How many votes change do recounts change? A: About 5 per 1000.

Now, let’s dig deeper into MERA’s data.  They studied the ballots from a collection of precincts cast in two Allegan County elections.  First, they hand-tallied the ballots cast in November 2008 in 17 precincts of Allegan County, looking at votes cast for 36 candidates running for the four statewide education boards (State Bd. of Ed, UM, MSU, Wayne State).  Second, they looked at fifteen Republican candidates, running for seven offices, in twelve precincts of votes cast in the August 2012 primary.  Altogether, they tallied some 135,000 votes.

Obviously, Allegan County isn’t perfectly typical of Michigan, and we can’t say for sure that we’d see exactly the same patterns elsewhere.  But as Donald Rumsfeld would have said, you have to conduct your analysis with the data you have, not the data you want.  And this data seems to match pretty well what I’ve seen elsewhere.

The statistical analysis of these results is far too long to include here; I will only touch on major results.  

I broke the 2008 general election results into three groups - Democrats, Republicans, and third-party.  Partly that was so I could derive multiple estimates of the error rates.  Partly it was to allow me to estimate any general bias against Democratic candidates.  And partly it was because there is good reason to believe that some of the errors are not statistically independent, and keeping separate tallies might protect me from certain kinds of mistakes.

The first two columns (“Official” and “Hand count”) simply reflect the election-night machine tally and the later count conducted by MERA.  The column labeled “Sum(errors)” shows the sum of the absolute values of the discrepancies found by MERA; whether the official tally was +2 or -2, it counts as 2 for this purpose.  The column labeled “variance” shows the square of the discrepancy.  The reason to use the square, rather than the actual discrepancy is to allow for the likelihood that in precincts large enough to include multiple errors, some of them cancel one another.  Use of the squared error means that large precincts and small precincts are treated equivalently.  Finally, “Net error” shows the overall effect of the tallying errors made, with positive errors offsetting the negative errors.

First, notice that the variance is remarkably even - and high - amounting to roughly one incorrectly tallied vote per candidate for each 200 ballots counted.  That’s amazingly bad - on a ballot containing 100 candidates or proposal choices, there’s a fifty percent chance that there’s at least one error in tallying it.  (It may be that the errors are concentrated on a relative handful of ballots, while the great majority are counted perfectly.  Our data doesn’t permit us to be sure, but that does not appear to be the case.)  That error rate is much worse than properly supervised hand-counting, and also worse than the much-maligned punchcard systems.

Second, notice that the “net error” is much less than the variance.  (For professional statisticians, recall that a poisson variable’s variance and mean are equal.)  This tells us that the great majority of errors are random, and we don’t appear to have found an attempt to steer votes from one party to the other.  This randomness showed up in each of the analyses I performed.  The problem is sloppiness, not dishonesty.

I calculated the apparent pro-Republican bias that was uncovered by MERA’s tally, but it falls well below statistical significance.  I base my crude “guesstimate” of 20 lost Democratic votes/100,000 tallied on a mishmash of evidence including the Gore-Bush  in Florida (before the Supremes put the kibosh on actually counting anything), the Byrum- Rogers race from 2000, Franken-Coleman (Minnesota, 2008), and Gregoire-Dino (Washington State, 2004).  

It appears that if you count 1,000,000 votes carefully, you generally discover about 200 additional net Democratic votes. Why should that be?  I don’t think it’s either deliberate or unconscious bias.  The effect is NOT concentrated in areas with Republican election clerks - it seems to be found equally in heavily Democratic areas which have Democratic officials.  The real cause appears to be that various Dem-leaning demographic groups are disproportionately likely to mark their ballots in ways that the scanners don’t read correctly.  Think about first time voters, visually handicapped, people who are marginally literate - each of those groups is predominantly Democratic.  It’s probably not just ballots; I bet if we could get good statistics, Democrats are slightly more likely to renew their auto registrations late, or send checks with transposed digits to the Secretary of State.  In any event, the effect is very small.  It only matters if you’re, say, 600 votes short in a state with six million votes cast.  (That would be Florida.)





                                        

November 2008 - 17 precincts - Allegan County                                         

DEMOCRATIC CANDIDATES                      

Official Hand Count Sum(errors) Variance  Net Error

49590    49595      119         257       -5

average variance: 5.2/1000 votes

                                    

REPUBLICAN CANDIDATES                      

Official Hand Count Sum(errors) Variance Net Error

53851    53842      125         251      9

average variance: 4.7/1000 votes

   

THIRD PARTY CANDIDATES   

Official Hand Count Sum(errors) Variance Net Error   

10486    10487      51          53       -1            

average variance: 5.1/1000 votes

      

Net Republican bias: 0.13 votes/1000 votes                                            

   

     

        

                                

August 2012 - 12 precincts - Allegan County

Republican primaries

Official Hand Count Sum(errors) Variance  Net Error

15287    15308      61          89        -21

average variance: 5.8/1000 votes


Tags: , , (All Tags)
Print Friendly View Send As Email

The majority of the under and over votes are in the AV count (4.00 / 2)
because when you vote at the precinct, you have the opportunity to correct your ballot right there.  If an AV voter makes a stray pen or pencil mark, they probably won't send that ballot back and get a new one, but try and correct it with more marks or circling or drawing an arrow.  That is going to be an overvote as far as the machine is concerned and only in a handcount will that vote get sorted out.


That's exactly what Alan Fox tells me. (4.00 / 1)
Alan has supervised a bunch of recounts, as a long-time member of Ingham County's Board of Canvassers.  He says, as a first approximation, that ALL of the miscounts are absentee votes.  For one thing, the polling places provide proper pens, so there's no reason to experiment with whatever can be found around the house.  For another, election day ballots are checked for over-votes before they're accepted, which causes most errors to be fixed.

The Allegan County tallies didn't separate absentee from walk-in votes, so I can't test that directly.  But precincts with a larger percentage of absentees did NOT seem to have a higher error rate in the MERA data.  Maybe that's just a statistical blip.  Maybe Allegan was atypical.  Or maybe walk-in voting creates just as many badly tallied ballots as absentee voting.  The only method I know to choose among those hypotheses is to wait for proper data.


[ Parent ]
AV v. walk-in as sources of missed votes (0.00 / 0)
Direct observation by election monitors revealed that Allegan missed votes could well be greater for walk-in election day voters than for AV voters: Allegan precincts were observed to copy AV ballots to correct over votes caused by stray marks or folds, and under votes due to faint marks. Whereas on election day, some precincts were observed to be using pencils, which invites faint marks. Also, some walk-in voter ballots with errors were clearly not corrected. Because this data was anecdotal, the audit report (which I authored) did not make much out of this situation. But from comments here, it appears to be peculiar to Allegan and perhaps a few other mostly rural or small town counties.

[ Parent ]
Great post! (0.00 / 0)
Thank you for the information. Two thumbs up.

Great Lakes, Great Times.

This is excellent, important work by MERA (4.00 / 1)
and I am glad to see it discussed here.

Mark, could you give some further discussion of this point:

"The reason to use the square, rather than the actual discrepancy is to allow for the likelihood that in precincts large enough to include multiple errors, some of them cancel one another."

If I thought about it long enough and went back to my stat textbooks, I could probably justify this... but I can't on the spur of the moment.

Why is the square the correct estimator of the (non-canceled) error rate?


The short answer is that you're looking at variances. (4.00 / 1)
If two variables are statistically independent, the variance of their sum is the sum of their individual variances.  

Think of each ballot as a variable with a mean of 0.00 (let's define our variable as change which results from a recount, in number of votes cast for the Democrat minus change in the number of votes cast for the Republican.) We'll make the standard deviation 0.005, which reflects 398/400 chances the variable is 0.000 because the ballot is unchanged when being recounted, 1/400 that it's 1.000 (because the recount found it was really a Democratic ballot initially counted as spoiled) and 1/400 that it's  minus 1.000 (a Republican gain).  As it happens, the variance is also 0.005, being equal to the standard deviation because the only observed non-zero values are 1 and -1.

Now let's think of a precinct with 1000 votes, each of which is a tiny variable (mean: 0.00, SD =0.005.)  The mean of the composite variable will still be 0.00, but the SD of the composite variable will not be 5, but SQRT(5).  A histogram made by counting a large number of 1000-vote precincts would show a bell-shaped curve centered on zero, with large shoulders at minus 2, minus 1, 1 and 2, and rapid drop-offs both above and below those numbers.  That's because although we would expect 5 errors to be found, about 2.6 of those errors on the average will be cancelled by errors of the opposite sign.  That leaves us an observed standard deviation of 2.36 - the square root of our expected number of errors.

Since the average of the absolute value of the errors we observe from 1000 vote precincts will be about 2.36, we tally the squares of our observations and recover our true value of about 5.  This same process will yield consistent and unbiased estimates for precincts of any size, whereas looking at the absolute values of the errors will give us smaller and smaller estimates as our precincts get larger.

I'm sure that doesn't make any sense at all.  It took me several days to work it out, and a couple hours to write the above explanation.  Stat isn't easy.


[ Parent ]
Poisson variables (4.00 / 1)
My analysis above is perfectly solid, if we already know that the miscounted votes take a poisson distribution, which is to say that they are statistically independent of one another, and not prone to clumping in any of various possible ways.  

Why do I assume I can use a poisson model?  First, because it's easy to apply - which of course is a very poor excuse.  Second, because the data seems to fit that model - which is a better excuse.  And third, because after many tests, I was never able to find any pattern that violated the "null hypothesis" that the errors were statistically independent - which is the best excuse a statistician can muster.

I suspect if I am able to obtain a larger, very clean, data set, I'll discover that corrections are needed.  But even with corrections, the simple model I describe above will turn out to give reasonable approximations.


[ Parent ]
in the duggan primary election (0.00 / 0)
the recount changed exactly 9 votes.

America either revives its industrial base or it dies. There is no future for people in a post industrial America.  

That's because the election was rigged... (0.00 / 0)
Benny Napoleon actually won that race, but before anyone could count all the votes, Mike Duggan stormed the city clerk's office dressed like Robocop and used a flamethrower on the ballots.

Among the Trees

[ Parent ]
Recounts find missed votes (0.00 / 0)
One important point underlying this analysis is that recounts (whether formal or not) don't "change" votes.  They find votes that are missed by whichever collection of technology and application of law fails to count them the first time around.  

Sometimes a recount will in effect find votes that were electronically counted and should not have been, but generally they add a vote here and a vote there where markings are unclear or where state law regarding treatment of invalid write-in votes is misapplied.


The average change - at least in Allegan - was zero (0.00 / 0)
It's not clear which mechanisms have the most effect, but overall, the number of votes added, and the number of votes subtracted, seem to about average out.  That is, in both the Allegan data sets, I didn't find an overall tendency to increase the tallies within ANY of the comparisons I considered.

It would be interesting to conduct studies with the cooperation and participation of election officials, so we could run a large number of ballots through equipment repeatedly to find specific ballots which are mistallied.  I'm sure some classes - faint markings, for example - are undercounted.  Others - particularly creases that are incorrectly seen as intentional markings - cause excess votes.  There's no reason to think these tendencies are exactly equal and cancelling, but that's pretty much what I found within MERA's 135,000 rechecked votes.


[ Parent ]
2002 primary (4.00 / 1)
In the 47th district, there was a recount. Joe Hune was trailing before the recount by 1 vote against County Commissioner Dave Domas. After the recount, he won by 2. 1858 to 1856

http://miboecfr.nictusa.com/el... - About 11,000 voters.


"He who would trade liberty for some temporary security, deserves neither liberty nor security" - Benjamin Franklin

Opinions are my own and not that of LCRP


MERA's Allegan Audit (0.00 / 0)
The Allegan Audit report is now available on line at:
http://www.michiganelectionref...


Search
Progressive Blogroll
For MI Bloggers:
- MI Bloggers Facebook
- MI Bloggers Myspace
- MI Bloggers PartyBuilder
- MI Bloggers Wiki

Statewide:
- Blogging for Michigan
- Call of the Senate Dems
- [Con]serving Michigan (Michigan LCV)
- DailyKos (Michigan tag)
- Enviro-Mich List Serve archives
- Democratic Underground, Michigan Forum
- Jack Lessenberry
- JenniferGranholm.com
- LeftyBlogs (Michigan)
- MI Eye on Bishop
- Michigan Coalition for Progress
- Michigan Messenger
- MI Idea (Michigan Equality)
- Planned Parenthood Advocates of Michigan
- Rainbow Mittens
- The Upper Hand (Progress Michigan)

Upper Peninsula:
- Keweenaw Now
- Lift Bridges and Mine Shafts
- Save the Wild UP

Western Michigan:
- Great Lakes Guy
- Great Lakes, Great Times, Great Scott
- Mostly Sunny with a Chance of Gay
- Public Pulse
- West Michigan Politics
- West Michigan Rising
- Windmillin'

Mid-Michigan:
- Among the Trees
- Blue Chips (CMU College Democrats Blog)
- Christine Barry
- Conservative Media
- Far Left Field
- Graham Davis
- Honest Errors
- ICDP:Dispatch (Isabella County Democratic Party Blog)
- Liberal, Loud and Proud
- Livingston County Democratic Party Blog
- MI Blog
- Mid-Michigan DFA
- Pohlitics
- Random Ramblings of a Somewhat Common Man
- Waffles of Compromise
- YAF Watch

Flint/Bay Area/Thumb:
- Bay County Democratic Party
- Blue November
- East Michigan Blue
- Genesee County Young Democrats
- Greed, Eggs, and Ham
- Jim Stamas Watch
- Meddling Outsider
- Saginaw County Democratic Party Blog
- Stone Soup Musings
- Voice of Mordor

Southeast Michigan:
- A2Politico
- arblogger
- Arbor Update
- Congressman John Conyers (CD14)
- Mayor Craig Covey
- Councilman Ron Suarez
- Democracy for Metro Detroit
- Detroit Skeptic
- Detroit Uncovered (formerly "Fire Jerry Oliver")
- Grosse Pointe Democrats
- I Wish This Blog Was Louder
- Kicking Ass Ann Arbor (UM College Democrats Blog)
- LJ's Blogorific
- Mark Maynard
- Michigan Progress
- Motor City Liberal
- North Oakland Dems
- Oakland Democratic Politics
- Our Michigan
- Peters for Congress (CD09)
- PhiKapBlog
- Polygon, the Dancing Bear
- Rust Belt Blues
- Third City
- Thunder Down Country
- Trusty Getto
- Unhinged

MI Congressional
District Watch Blogs:
- Mr. Rogers' Neighborhood (CD08)

MI Campaigns:
MI Democratic Orgs:
MI Progressive Orgs:
MI Misc.:
National Alternative Media:
National Blogs:
Powered by: SoapBlox