by Chris Bowers, Tue Nov 30, 2004 at 12:16:41 PM EST

*The revised BC-TX-Exit-Poll Excerpts showed that 20 percent, not 23 percent, of all Texas voters were Hispanic. They voted 50 percent for Kerry and 49 percent for Bush, not 41-59 Kerry-Bush.*

That's a pretty nifty 19-point swing, huh?

So at the risk of being called arrogant once again by the likes of Mr. Morin and pollsters who know (supposedly) what they are doing, how is it possible to miss on this by that large of a margin with your properly weighted exit poll, and not now have to adjust downward the claimed Hispanic vote that Mr. Mitofsky said Bush got on Election Night?

Mystery Pollster, of course, has a new, extensive piece on exit polls. This time Blumenthal explains the evolution of exit polls, explaining why early exits are different from others, what we can extrapolate from this and what we can't, and why it will be difficult for us to acquire the information we seek. The piece is huge, but I will try to excerpt a meaningful portion: First a review of the process: On November 2, 2004, the NEP conducted separate exit polls in all 50 states and the District of Columbia plus a separate, stand-alone "national" sampling. NEP instructed its interviewers to call in to report their results three times on Election Day (all times are local): At about noon, at about 3:00 p.m. and roughly an hour before the polls closed. NEP started releasing tabulations of the vote preference question for the national survey and for most of the battleground states on an hourly basis beginning at 1:00 p.m. Eastern Time.(...)**Those who are concerned about the discrepancy between the exit polls and the actual vote count should focus only on #3, the tabulations prepared just before poll closing for each state.** Unfortunately, these were not officially released. The early releases (#1 & #2) were widely leaked and posted on the Internet, but had bigger discrepancies resulting from incomplete samples or the use of completely unweighted data. The later releases now available through official channels (#5) are not helpful for analysis of the discrepancy since they were "corrected" to conform to actual vote results. The only source of "just-before-poll-closing" results appears to be the data in the paper by Steven Freeman (more on this in the next post).

Third, discussing sampling error in exit polls, Nick Panagakis sent me the following email:

Exit Polls Vs. Election OutcomesIn the general debate comparing exit polls and election outcomes, there are two fundamental weaknesses in analyses of differences between exit polls and election outcomes. The weaknesses are: 1) calculating error between poll and election outcomes and, 2) the effect of sample design in calculating sample error for cluster samples used for exit polls.

Here I use Steven Freeman's paper "The Unexplained Exit Poll Discrepancy" only as an example. This discussion applies to any of the exit poll vs. election outcome analyses which seem to come up after any election. Note that exit poll survey data are used here, not survey data weighted by actual election returns which would be redundant.

OUTCOME VS. EXIT POLL ERROR

Freeman: "The conventional wisdom going into this election was that three critical states would likely determine who would win the Presidential election - - Ohio, Pennsylvania, and Florida. In each of these states, however, exit polls differed significantly from recorded tallies." Freeman in Table 1 uses "Tallied vs. predicted" as his source data. In Ohio, Pennsylvania, and Florida, the differences between Bush's final tallies [outcomes] and his earlier exit poll percentages were, respectively, 6.7%, 6.5%, and 4.9%.

Differences between poll and election margins in statistical analysis should not be used. It is the poll estimate that is subject to sample error, not margins; e.g., 48% voting for A and 52% for B. Error on the margin effectively overstates estimate error by a factor of two. This is also complies with National Council on Public Polls post-election poll analyses.

Elections are zero-sum games. Two points high for one candidate means two points low for the other. Vote estimate errors for each candidates are not additive which is the effect of using margins in an analysis.

The differences between exit poll estimates and final election outcomes in these key states subject to tests of significance are as follows:

Ohio Bush: Exit poll 47.9%; outcome 51.0%. Difference +3.1

Ohio Kerry: Exit poll 52.1%; outcome 48.5%. Difference -3.6

Pennsylvania Bush: Exit poll 45.4%; Outcome 48.6%. Difference +3.2

Pennsylvania Kerry: Exit poll 54.1%; Outcome 50.8%. Difference -3.3

Florida Bush: Exit poll 49.8%; Outcome 52.1%. Difference +2.3

Florida Kerry: Exit poll 49.7%; Outcome 47.1% Difference -2.6

Differences between poll estimates and election outcomes range from -2.6 to +3.6, not 4.9% to 6.7%.

EXIT POLL STATISTICAL ERROR

The conclusion that "exit polls differed significantly from recorded tallies" in the three states is incorrect.

However, Freeman's page 6 footnote is correct: "This analysis assumes a simple random sample. If on the other hand, states were broken into clusters (e.g., precincts) and then the clusters (precincts) were randomly selected (sampling individuals within those selected precincts), the variances would increase."

By necessity, exit poll samples are cluster samples. The number of precincts in states typically number in the thousands. Wisconsin, for example, has 3,700 precincts. Illinois, a larger state, has 10,000.

Standard error assuming a simple random sample is calculated, but only as a first step. A confidence level of at least 99% is assumed - higher than the customary 95% - probably because of the higher standard of precision for exit polls and the number of races involved, about 100 across the states including the race for president and races for senate and governor on November 2.

A measure called the Design Effect must then be calculated to adjust the standard error for the cluster sampling effect. The magnitude of the Design Effect depends on the average number of interviews per precinct in each a state sample. The smaller the number of average interviews per precinct in a state, the smaller the design effect. Design Effect also differs by characteristic and can be much larger for characteristics highly clustered by precincts such as race. Design Effect is a variance measure so the square root is used to multiply the standard errors.

Without knowing the number of precincts sampled, you can't calculate the Design Effect. But Design Effect square roots are said to have typically ranged from 1.5 to 1.8 in the November exit poll. I used 1.6 as a "best estimate".

Conclusion. All of the state estimates above are well within their error calculations below.

Ohio, n = 2020. Sqrt (.5 X .5) / Sqrt 2020 X 2.6 X 1.6 = +/- 4.6%.

Pennsylvania, n = 2107. Sqrt (.5 X .5) / Sqrt 2107 X 2.6 X 1.6 = +/- 4.5%.

Florida, n = 2862. Sqrt (.5 X .5) / Sqrt 2862 X 2.6 X 1.6 = +/- 3.8%.

Nick Panagakis

