Monuments to failure – part 4 – knee jerk reactions

Howdy all, in this column I begin an {n} part series exploring ideas and code of mine that for one reason or another did not pan out. It’s important to note that none of these ideas were directly tied to my research and funding, though any could have found their way in pending better results. These posts will broadly represent blind explorations of tools and concepts outside of my discipline in an attempt to broaden my repertoire. In each entry I will explore the basic idea, why I tried it, what design considerations I made, and why I considered it a failure. I will also link or will embed the source code for others to experiment upon with attribution under the GNU general public license.


This chain of fail follows a more winding path from the others. While the PRSLearn and home automation fails began with singular, tinkerer’s goals, those following hit a little closer to home. While the previous projects were deemed failures due to structurally falling short of designs goals, that which follows failed due to sentiments and false hypotheses. Perhaps there is something to be learned from the comparison?


Volunteer first response agencies suffer a significantly high burnout/ washout rate. The labors involved are neither easy nor routine and some of the inherent stressors may quickly and efficiently disavow one’s illusions of a kind and just world. Some will reject this quickly, some have the stomach for it, but I fear many may experience a slow indigestion from poorly processed memories and exposures. Per my interest in first response, I’ve been contemplating whether my laboratory’s epidemiological methods could provide some insight in identifying and assisting those dealing with critical incident stress.

Looking for convenient metrics and low hanging fruit, I wondered whether less socially integrated responders might fare worse during a critical stress incident. If we could blindly and computationally identify members who interacted within the team at a reduced rate, perhaps we might know to keep a better eye on them to offer support more readily when needed? Borrowing from network theory, a convenient metric for this might be one’s degrees of connectedness and centrality within a given social network. Facebook provides a rich source of this data, but initial proposals to mine this information were recieved poorly. The perceived slight to an agency’s privacy was taken in heed, and I set out exploring the facebook API to see whether such data was truly publicly available.

It wasn’t. It could have been collected with individual app permission requests which would almost certainly surpass one’s threshold for creepy. It also could have been collected with a pen and paper, manually compiling a web of individual’s in-group network edges. But, as such access would depend on privileged, non-public relations, it would certainly violate the spirit of those relations and this idea was scrapped.

The question now became whether anything useful had been constructed along the way, and whether the developed tools would be more appropriate for a different task. I hammered out a quick crawler script for public facebook groups and pages. The idea would be to take a set of keywords, group types, and geographies, start with a seed page, then slowly crawl its network of likes and liked-bys until every page matching the query set had been found. I took first response agencies as a general sector of public interest which would likely maintain a facebook page and could potentially prove useful later and created the following network:  

controls in bottom left, hover over nodes for information

It’s pretty, but it’s not yet useful. Maybe the highly central, red to yellow agencies could prove rich ground for testing technologies and innovations? I’d certainly wish to talk to them first if I was selling equipment, but I don’t sell equipment. The idea remained a curiosity and sat on the back burner for a while.


Southern Soma

Years ago in a chat with a fellow futurist colleague I pondered “how long until we invent ourselves out of jobs and sit around taking soma all day?” My colleague responded, “it’s already happening dude, do you mean the rural meth epidemic?” Well in 2017 we are staring down the dual barrels of a meth and opiate epidemic as well as a general spike in blue collar deaths of despair. The factory jobs aren’t coming back, the retail and transportation sectors are gazing into the abyss, and the canary in this coal mine will be well and truly dead before our national culture ever dares to acknowledge the utility of a mandatory minimum wage.

The modern opiate epidemic presents a crucial challenge to public health; it’s nature as an economically driven disease of despair defies easy solution. One glimmer to a potential trail of causation appeared via the 2017 Appalachian Studies Conference. A poster by Carillion researchers mentioned pain management following occupational injuries as a pathway to addiction in blue collar, rural communities. I’d had my own run in with chronic pain following a torn meniscus during a rescue call. Absent proper treatment, its condition had slowly worsened for about a year before hitting a second injury which landed me in physical therapy. PT was miraculous, almost every notable marker of pain and performance significantly improved over the span of a month until I had returned roughly to my prior state. Taking this in the context of the opiate epidemic I wondered, if occupational injuries are a stepping stone into addiction, could physical therapy present a viable intervention? Short term pharmaceutical management of pain would undoubtedly be cheaper than time with a trained therapist, and one could easily imagine that such would present a perverse incentive for insurers. Maybe some communities would resist such incentives and have greater access to physical therapy, maybe this would even grant a preventative effect?


Quantifying Access

With these questions in mind, I set out to conduct a preliminary analysis of whether or not differential access to physical therapists proved protective against opiate related morbidity and mortality across rural Appalachia. I defined access as the number of physical therapy clinics in a county per 10,000 people. As offered by my friend and colleague Alex Telionis, a more comprehensive study could have sought mean population travel time to physical therapists, but this was truly just a test of plausibility. Per a negative outcome, after asking the facebook hivemind for suggested markers, I decided upon CDC county mortality stats per the wonder database.

My first attempt to quantify the number of physical therapists per county was to reuse the facebook net crawler from the first response agency centrality graph. I picked a physical therapy clinic as a starting point and pulled all pages from the United States with a category of physical therapist that were connected to the outlying like network of the starting therapist. The resulting network may be found here:

controls in bottom left, hover over nodes for information

These results were lacking, offhand testing showed numerous area therapists simply weren’t visible on this network. I could easily imagine that on a local scale we might have isolated cliques via rival provider networks. However, zooming out far enough, you’d imagine that at least one smaller, private facility or agency would have attempted to curry favor with both sides, thereby connecting different agencies to the total graph. A more obvious conclusion was that physical therapists just weren’t that into facebook, moving on.


Scraping the web

Lacking an easy source of PT clinic geolocations, I trolled the web and hit the hivemind once more. Many states keep professional registries of clinics, but these are often independent efforts that would require a significant amount of paperwork, registrations, and parsing to incorporate. Framing this as a curiosity project, there was just no way to requisition this much time without any initially promising findings. I reached out to some national registries and associations to see what they had to offer but the results were likewise discouraging. Then I stumbled upon Checking their terms of service, the registry was open for informational and education use, bingo! I threw up a quick webscraper with beautifulsoup to pull down clinic addresses across Appalachia, geocoded them to coordinates via the python geocoder library, and matched them to tiger shapefile counties via geopandas to get a clinics per 10k count.


Quantifying Mortality

The next task was to quantify the negative effect of opiates on a given area. I went with mortality as it was likely to be more reliable than any publicly available figures of ongoing abuse or addiction. I can’t imagine individuals are likely to disclose their addictions to a reliable and consistent degree across territories, nor do I imagine that municipalities or practitioners are consistently incentivized to publish their opiate throughput, but dead means dead. I went with the CDC Wonder mortality database and ran a query pulling county level age adjusted death rates for working age men and women from HHS regions 4, 3, and 2 ~ a geography containing the bulk of Appalachia. Within these, I picked a subset of counties marked as micropolitan and non-core by the 2013 urbanization index, here used as a proxy for rural. Beyond my concern with blue collar populations, the hypothetical interplay between physical therapy and addiction could have been completely different in nature between rural and urban regions via employment types, insurance coverage, and cultural considerations.

The next challenge was in crafting a case definition for mortality due to opiates. Looking through the ICD-10 codes, the F11 codes seemed to provide a pretty strong catch all for opioid mortality. I submitted the query and received.. almost nothing. A small handful of county stats were returned with most marked as unreliable. Coming from my experience surveilling public social media posts, it was a potent reminder that privacy matters. All counties with less than 20 deaths in a given time frame were removed from the results to prevent unscrupulous researchers from re-identifying individuals. As I was looking specifically at regions with low population densities, such a quantity of overdoses would have bordered on the apocalyptic. For now, the entire bottom quantile was removed from my data set and with an countable offhand, no meaningful insights could have been interrogated from this data.


Mission Creep

Still eager to see some sort of connection I resorted to broadening my case definition to get past that 20 case limit. I hit every opioid related ICD-10 mortality code, I boosted the year range of the data to 5, 10, and 15 years in succession, and I boosted the age range to all adults. The results were negligible, the correlation was both weak and negative, and countless counties lacked any form of clinic. This goose was cooked. I could have further tweaked the data, perhaps stratified it differently or re-evaluated my case definitions. I also could have looked into fitting a model to the case counts and back predicting numbers for those sanitized counties as was done with this recent Zika study. But really, it’s just a negative result on an analysis borne of curiosity. I’d already delved as far as I was willing to go into the dark side of data mining and would almost certainly induce bias by going further. Further, going back beyond 5 years would likely induce severe flaws due to changes in local conditions, clinic access, and cultural contexts for addiction.


Sad pandas


The lessons

It was interesting to see how the scrapers came in for both cases. People may complain about social media being noisy and unreliable, and it is, but there is no such thing as perfect data. The blessing of public health is that we’re not looking to prove a theory or discover a particle, we’re fireman trying to stamp out our epidemics. If something can be done a little better or a little faster than before then it is inherently worth doing. It was actually shocking how complete and accurate the first responder net was, as those pages not created privately by the agencies were often manually created by the community, complete with up to date location data. This could prove invaluable for quite a few scenarios. In the event of a large scale crisis, having a geocoded map and contact numbers for all the hardware and big box stores could prove invaluable for resource allocation. Likewise, in the event of a suspected common source outbreak, identifying and locating all franchises in an exposed chain may prove useful.

There were quite a few flaws in my approach to the opiate study owing to my time constraints. As stated before, seeking the number of practitioners per 10k rather than private clinics would have been a far better metric as many practitioners travel for in-home care. Second, small rural counties may simply lack the population to sustain a dedicated clinic, seeking their care either through local hospital practitioners or through adjacent counties. Third, who is to say all clinics are on the same level? Someone is writing these prescriptions, to assume that all clinics involved in pain management are playing on the side of the angels may border upon naive.

Finally comes the question of data mining. I’m pretty ambivalent on the idea as a whole, though it’s certainly a source of division. On one had, we have vast pools of data and zippy tools that were simply not available before and we’d be fools not to use them. On the other, if you publish a thousand data mined studies with a 0.05 p-value, you just wrote 50 completely false conclusions. These conclusions may inform the development of drugs and the deployment of safety measures. They may squander the funds of more efficacious programs as well as the already dwindling faith of the public. As with ChatterGrabber, I’d consider such mined and scraped data sources as an alarm system, a signal that something may be happening and the ground truth must be known.



  1. 2015 TIGER/Line Shapefiles (machine – readable data files) / prepared by the U.S. Census Bureau, 2015
  2. Centers for Disease Control and Prevention. CDC Wonder. March 2017.
  3. (n.d.). Retrieved March 17, 2017, from