1.5 Discussion (Admitting The Mistakes)

This project isn’t perfect. The most obvious issue is the integrity of the data; since it was scraped from Wikipedia and IMDb, there are going to be missing cases. In particular the calculated death rate of queer men is likely to be lower than it actually is, since the list of dead queer men I drew the data from was less rigorously checked than the list of dead queer women. This is an issue for any analysis that relies on community-curated lists; I hope that the data used here is as accurate as possible under the circumstances.

It was impossible - or at least so difficult as to not be worth it - to find data on characters’ race, specific orientation (lesbian/bisexual/panseual/etc), and status as a regular or minor character. These variables had to be left out of the project, when they may help explain why characters live or die.

Logistic regression could have been a valuable addition to the analysis, since the outcome variable is a binary (died/lived). It was left out of this paper due to the importance of finding outliers and the fact that it is much more difficult to explain.