ChatterGrabber: Thoughts on Ethical Surveillance

The changes

Edward Snowden changed the world in 2013. In one fell swoop, we discovered in unison that every conspiracy theory was true, every notion of privacy was false, and our entire threat surface had been compromised. For those 90’s techno-paranoids who grew up watching the X-Files, these disclosures offered us a sigh of relief, a confirmation that the world was indeed as we had surmised, that our past cautions may have been justified. For those of us beholden to the just-world fallacy, this time marked a shocking violation of our faith in institutions. For everyone else though it was time for a chit chat.

Fast forward to today, we’ve had that chat – the privacy advocates lost. Most of us have surrendered to the convenience of learned helplessness and accepted that our entire personal life is sitting on a hard drive in a marketing firm in New York, waiting to offer banal goods and services to suit our most fervent, private desires. Likewise, the penetration of social media is near absolute. Now rather than offer comprehensive forums for sharing media and sentiments, apps seek to differentiate themselves in their limitations. Would you like to live in the moment, to share your whims with friends and family without fear of an establishing an accidental chronology? Try SnapChat where every message vanishes upon viewing. Are you looking for a like minded romantic partner, but only one whom shares your favorite times and places? Happn will gladly assist in exchange for a detailed log of everywhere you have been.

 

The potential

The complete dissolution of our private lives is not without some boons. Whether these are being reaped by the right people in the right manner is a different question, but the potential for altruism is there. How might one altruistically leverage such information though? Social media is at once instantaneous and borderless. For the first time in human history one may watch the flow of thoughts and feelings across an entire continent with a humble laptop and a free public API account. One can find those patients who can’t afford a doctor, choosing instead to mine the hive-mind of their social network for a presumptive diagnosis and affordable regimen. One could also see the first whispers of a viral apocalypse before patient zero has been admitted and the strain has been characterized.

As exciting as these prospects are, one must recognize that social media data is inherently noisy. Machine learning gets cheaper and better every day. The question is not what may be done, rather the scope in which it may be done well. It’s hard to signal match single points without egregiously violating the privacy of those points’ real world counterparts. It’s also shockingly difficult to generalize these assumptions to new people, times, and places. Language is living, diminishing marginal utility is a thing. The rich and poor and the young and old perpetually strive through visual and linguistic identifiers to maintain the appropriate distance from one another. A previous experiment in identifying tweets indicating firearm violence was for a while rendered useless by the shots fired memeThere will be many more memes.

These limitations informed the development of my Twitter epidemiological surveillance software ChatterGrabber. With ChatterGrabber I make no claims to predict an outbreak, to quantify the cases, or to find the Broad Street pump. ChatterGrabber is an alarm system, a ringing indicator that something appears to be happening and that a traditional investigation is in order.

Don’t make it personal

ChatterGrabber was developed under the ethical guidelines proposed by Caitlin Rivers’s 2013 work, “A framework for ethical use of twitter for public health research,” and these guidelines present a strong starting point for comparable efforts. In essence, digital surveillance work should be held to the same standards as physical surveillance. Most would not be particularly bothered to see cameras strewn throughout a retail store, it’s part and parcel of our times. Most would however be extremely alarmed to see a clipboard wielding scientist following their every movement, jotting down notes about which displays they visited and for how long. Depersonalized measurement of mass behaviors of interest is a thing. Tracking individuals unawares, combining data sources on said individuals, and building detailed dossiers of identifying information would surpass any common threshold for creepiness.

Secondly, robust sanitization of personally identifying data is key. There’s a saying in finance: “write every email like it’s going to be on the front page of the New York Times tomorrow.” A wise researcher should collect data under the pretense that every collaborator will (accidentally) hit that send-all button at the worst possible moment. If you’ve collected dossiers on a thousand strangers, every moment is the worst possible moment.

 

Back to the present

Public health and science in general has a tendency to be data greedy. Faced with stringent guidelines on HIPAA’d data and strict paywalls on the rest, there’s a tendency to grab as much as possible when the opportunity presents itself. This is where the real world analogy applies once more. No one would ever object if you called the fire department when their smoke alarm went off, people expect help from institutions when crises are apparent. Most would however flat out refuse an offer to assess their risk of fire, especially when real world penalties are at play. The concept of pre-crime is offensive to our most basic concepts of freedom. This is a vital consideration for public health surveillance. One must be mindful to seek the signs of the disease itself, not the long run of personal choices leading to an inevitable outcome.

Per the previous section, proper sanitation will serve one well here. That which is not collected cannot be abused. There will always be those broken window adherents who will seek to punish every transgression. A well-intended researcher’s digital approximation of Santa’s naughty list will make this job much, much easier.

 

Sensitive data

If your data collection activities are to be supported by the public, it’s important to realize that there are some activities the public will never support. Cynically, one may wave the red white and blue flags of fighting bioterror, curing cancer, and saving golden retriever puppies to justify most any cause. However, the importance of personal choice and protecting one’s own is tantamount. New Yorkers sputtered with rage when mega sized corn syrup and citric acid rations were pulled from their beloved bodegas. Bars feared losing their regulars when smoking sections breathed their last gasps. People cherish the freedom to trash their and their children’s bodies as they see fit. Any perceived compromise of this freedom will be received poorly.

One run in I had with such sensitivities was in coordinating with a regional emergency services organization seeking a twitter dashboard for active shooters. While many in the room followed the traditional data-hungry approach, one naysayer in the room shot down each potential project with something to the effect of “do you want to answer the phones if we get hit with a FOIA on this?”  The collaboration never moved forward. In another effort I was seeking to explore the degree of social network connectedness amongst first responders, hoping to identify less connected, more fringe individuals who might suffer silently alone following critical incidents. Most reactions to this idea ranged somewhere from ambivalence to marginal interest, but one in the room turned my inbox into a river of fire, alleging such an effort would absolutely betray the trust of everyone involved and would never pass an IRB review. The latter assertion was broadly untrue, but the former had some merit. Even if an effort is noble in goals, even if the data used was freely available to the public, and even if the materials gathered could not possibly be used to harm others, perceptions matter. The erroneous belief that such a method collects private data was enough. Truly, when a script on a single laptop can find all the needles in all the haystacks, one must consider if publicly available still serves as a suitable standard for privacy.

 

Devils in the details

Coding is hard, a lot of interesting ideas will never work. While there is no hippocratic oath for science, we do have a surprisingly convenient defense when pushed beyond the bounds of our conscience. Within the development of any project, opportunities will rise and ideas will perish. By crafting surveillance scripts, one is granted intimate control over what data structures may be pulled, what queries are used to pull them, and how they are parsed, paired, and pared on their way to storage. Development of software presents unlimited opportunities for the sabotage/ preemption of misuse. Your software will run quicker if you only hold onto the metadata you need. If one forgoes retrospective review of personal histories and only tracks events of public concern, one frees up valuable resources for better characterizing those events, for tracking others, for investigating more quickly when a signal spikes – less hay more needles. Ethical surveillance is not only a boon to one’s conscience, it’s a boon to one’s software as well. And who needs to clog up their personal machine running the Panopticon anyways?