codebook:

for context — please visit this page

now, this is intended to be an ethical awareness project so i’ve done some reasonable things for anonymity:

  • 1) i’ve stripped the data of students’ names and replaced them with their initials

  • 2) if a student’s set of initials is unique in the class (i.e. there is not at least two others with the same initials), i have removed their last initial (those are found at the bottom of the table). in this way, each individual can identify themselves but no outside parties can (i.e. look for ur own initials and your score)

  • 3) all of the initials will be encrypted with a secure key and cipher-shifted at the end of the week, so none of this is permanent information regardless of its anonymity

other sidebars:

  • 1) under the actual usersearch api on datamatch, students’ names are absolutely not stripped next to their data. i just did it to protect the same privacy i’m trying to draw attention to.

  • 2) i wouldn’t put others through something i wouldn’t do myself. my rice purity score is 26 (sorry mom).

  • 3) i think that rice purity is the optimal metric to make an example out of. i think it’s info that one might want to keep somewhat private (given it is an index of criminal/drug/sexual tendencies), but there really is no severe career, social, or safety consequences attached to people knowing. i of course refuse to publish others’ gender identities or locations, because those do have said ramifications. however, what is scary is that all of that information is accessible in the exact same place (you pull one profile and all of that comes with it).

  • 4) i have data access for all ~30,000 users, but am choosing currently to only publish harvard freshmen out of intentional restraint. i want to make a point and garner attention towards the need for data privacy, but in the most reasonable way possible.

[the following section has been redacted]

cracking datamatch — rice purity scores of all harvard freshmen who submitted one