What makes analysis complex?

There are myriad ways to formalize, visualize, and query a dataset, but how well have these helped people make sense of it?

Dates: Jan 2015
Skills/Subjects:

There are myriad ways to formalize, visualize, and query a dataset, but how well have these helped people make sense of it? Analysis is generally a deconstructive sensemaking practice whereby something complex is broken into its fundamental parts (Beaney, 2015). To analyze a complex dataset then would be to break the set into its values. Whereas a value may be intrinsically significant given the reason for its collection, it should not be given precedence over the rest. Yet to analyze every value of a large dataset individually is beyond the scope of human cognition, requiring some sort of augmentation (Pope & Josang, 2005). For example, looking only at one value, ignorant of the rest, is it signal or noise? Regular or random? Data wrangling techniques help here by deftly mapping values to more human-friendly and interactive dimensions, like color and space (Heer & Shneiderman, 2012). Visual data analysis helps people make sense of large, complex datasets by also allowing the analyst to interact with the data by changing its human-friendly dimensions (Endert, Fiaux, & North, 2012). However, despite these human affordances and mathematical modeling, these works cited don’t necessarily ascertain the analyst’s intentions or goals.

Perhaps the most complicated aspect of analysis is framing, or making sense of the data in relation to each other and the world. In order to find meaning in both the data itself and the affordances granted by visualization, analysis requires the proper mental organization and aptitude to frame what’s granted (Heuer, 1999). Rather than shareable infographics or technically impressive visualizations delivering conclusions to our news feeds, technology should be deployed at critical points in a comprehensive analysis (Pirolli & Card, 2005). Overuse or misuse of technology for making sense of data could exacerbate an already complex problem. Although individual datum or subsets of a dataset are irreducible, they have both intrinsic and relative meanings. For example, criminal activity in the chat logs of a particular IRC channel is more readily identifiable than of all IRC traffic going through a particular server. How then, given the latter, could intelligence analysts figure out how to investigate reports of criminal activity if they’ve tapped only that one server? If humans can’t naturally handle a bottom-up approach to huge datasets (Pope & Josang, 2005), sensemaking must happen top-down to a Goldilocks threshold. That is, it can’t be too high-level that details are indiscernible, and it can’t be too low-level that analysts get muddled in individual datum.

Either way, sensemaking and seeking actionable information is reflected in the complexity of the technology that enables it. Dynamic querying evolved from a possible general approach (Shneiderman, 1994) to a component of a sprawling, yet incomplete or sparing, taxonomy of approaches to visual data analysis (Heer & Shneiderman, 2012). This breadth and depth of approaches, in addition to cognitive models of analysis demonstrate the complexity of the tools themselves dedicated to reducing complexity, or at least elucidating meaning, of complex datasets. Future work in data analytics could be what-if analysis, involving predictive analytics, would require better handling, such as semantic interaction (Endert, Fiaux, & North, 2012) of the cognitive and statistical models that motivate and substantiate analysis.

References

  • Beaney, M. (2015). Analysis. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy (Spring 2015.). Retrieved from http://plato.stanford.edu/archives/spr2015/entries/analysis/
  • Endert, A., Fiaux, P., & North, C. (2012). Semantic interaction for visual text analytics. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 473–482). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=2207741
  • Heer, J., & Bostock, M. (2010). Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 203–212). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=1753357
  • Heer, J., & Shneiderman, B. (2012). Interactive dynamics for visual analysis. Queue, 10(2), 30. Retrieved from http://dl.acm.org/citation.cfm?id=2146416
  • Heuer, R. J., & Center for the Study of Intelligence (U.S.). (1999). Psychology of intelligence analysis. [Washington, D.C.]: Center for the Study of Intelligence, Central Intelligence Agency. Retrieved from https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/psychology-of-intelligence-analysis/
  • Pirolli, P., & Card, S. (2005). The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis. In Proceedings of International Conference on Intelligence Analysis (Vol. 5, pp. 2–4). Mitre McLean, VA. Retrieved from http://vadl.cc.gatech.edu/documents/2__card-sensemaking.pdf
  • Pope, S., & Josang, A. (2005). Analysis of competing hypotheses using subjective logic. DTIC Document. Retrieved from http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=ADA463907
  • Shneiderman, B. (1994). Dynamic queries for visual information seeking. Software, IEEE, 11(6), 70–77. Retrieved from https://dl.acm.org/citation.cfm?id=625405