cracking datamatch

why u should be more conscientious about putting private info on the internet — a case study in datamatch’s lack of security/privacy by pulling the rice purity scores of all freshman users at harvard

tl;dr do not keep putting your info into random apps that offer u a free lunch (cough cough datamatch and claim)!!! there is no such thing as a free lunch!!! if there is no product, u r the product!!! do not put info into the internet that u would not want ur mom or future boss to see!!!

context:

i was tipped off by a discovery of the api url that stores all datamatch user profile pictures (that anyone can access at any time). this gave us the hunch that the other data was insecure/easily accessible by anyone. and this is scary cause datamatch (a dating app for verified college students) has been around since 1994 and boasts hundreds of thousands of historic users. this year alone, nearly 50 colleges participated.

so i borrowed an account from a friend, opened up the datamatch api on my computer (to be clear, i am not a datamatch user myself, i practice what i preach), and with the help of some anon partners + my younger brother (he a genius), started tinkering around. 

we eventually figured out a couple of things:

  • 1) datamatch does not encrypt most of ur data whatsoever, despite providing a guise that many of these inputs will not be visible to other users

  • 2) not only do they not encrypt it, but they send said data to every single device that interacts with their api (e.g. if user a matches/searches for user b, all of user a’s input data is rawdogged to user b’s device from the firebase backend)

  • 3) with this half-assed web architecture, anyone with 10 seconds can thus pull this sensitive/vulnerable user data from their personal device

boring technical stuff:

if ur interested in the more technical terms — datamatch utilizes websockets (a bi-directional tcp protocol that allows continuous, duplex communication between a server and an end user). these communications r referred to as ‘handshakes’. when these ‘handshakes’ occur, data is sent from the user to the server and vice versa. in most ‘handshakes’, regardless of protocol, care is taken to either mask or encrypt the response data that the user receives. datamatch does not do either of these, and legit rawdogs lots of ur data into someone else’s computer.

sidebar: i am a hobby coder who only likes computer engineering cuz i play hella video games (i study gov). i have literally 0 interest in being a swe (although i think i would be a beast at it if i did lol). my only experience with protocols/networking is the ‘tcp/ip for dummies’ book my mom got me when i was 9 (b4 websockets existed). i told her i wanted to get into programming so she just bought a bunch of books that had anything to do with computers. shoutout mom u r the best.

i think this goes to show the minimal amount of effort devoted to privacy in the modern day, and it’s sad. if my terrible, ebay-foraged 2013 trash can of a mac desktop and baseline knowledge of software-defined networking can uncover the vulnerable user data of hundreds of thousands of college students — we’re in a really bad place.

findings:

what we found was terrifying but not surprising. because we found literally all the private/personal user input data they store — from rice purity score to gender identity to location on campus. for all of the hundreds of thousands of people across all the dozens of schools that participated in datamatch.

how do u know im not lying? click here for a list of the self-reported rice purity scores of all the members of the harvard freshman class.

we also found a couple of other interesting things:

  • it appears that their search algorithm implicitly discriminates against ethnic names (e.g. hispanic or proto-slavic) because it isn’t suited to handle diacritics

  • it also potentially appears that the ppl who work at datamatch can/do rig the results (on some loveint type shit lol)

conclusion:

nonetheless, the important part here is simple. apps like datamatch may seem ‘worth it’ on the surface (given they offer u lil bits of $), but they really don’t protect ur data at all. they also have the potential to profit off ur data hella, evidenced by their explicit declaration of their right to sell ur shit in their privacy policy. in essence — there’s no such thing as a free lunch, ever. corporations writ large offer u $ cause they make so much more off of it (whether it’s money or other gain). these ceos arent all mother teresa bruh.

apps like datamatch and claim also say they ‘de-identify’ ur data before they sell it, but thats a straight myth (talk to prof. latanya sweeney about it on campus, she’s the one who discovered that ~90% of de-identified data can be reidentified using public registries).

in sum — please be more careful with ur data. don’t upload shit if u wouldnt want ur mom to see it. read this letter if u wanna know more about how ur data privacy is being weaponized against you — and how corporations are building up arsenals to tighten their grip on society. stay safe and may god bless u.