Capturing, Interpreting, and Visualizing User Interaction

User interaction in computing can be captured via many devices, most of them operating on a spatial metaphor (see appendix). In our collection of visual data analytics (VDA) literature this semester, interaction is most often captured in terms of cursor positions and clicks and text entry chains and buffers.

This interaction is then often captured as system events with a timestamp, and sometimes they are captured via screenshots, video, or a researcher’s own eyes. These captures then can be interpreted absolutely, combinatorially, or contextually. Absolutely interpreted events like clicks could be interpreted as selections of virtual objects. Some events could be combined like a click followed by a change in cursor position, indicating a dragging operation. Finally some events, often analyzed more closely by a researcher, could be interpreted contextually as tasks. For example, Heer and Shneiderman define 12 types of these tasks that enable a dialog in analysis [1]. This contextual capture of dialog is perhaps more informative and directly modelable than the former two interpretations in analytic provenance, or making sense of sensemaking processes. These interpretations themselves can be visualized via UI metaphors and idioms to aid in understanding analytic provenance. This can be a temporal mapping [2] or cumulative value [3] of spatial data captured by low-level events like clicks, or it could be more high-level data like state changes in applications [4] through more complex visualizations.

The current trend in the interface for software in this domain seems to include heavy use of metaphor rather than idiom in its interface. Glass Box’s tabular view, CZNotes’ (and its cited and CommentSpace tools) notes [5], and Brown et al.’s traced map are all examples of metaphorical interfaces. Whereas, an idiomatic virtual interface is one whose meaning is learned not bounded by a metaphorical source. Most interfaces,  such as a scrollbar or mouse and cursor, now use primarily idiomatic interface designs [6], as do Heer and Schneiderman’s taxonomy. Only VisTrail seems to have a more idiomatic approach to such an interface, interpreting provenance data as states or tasks and visualizing it in more abstract ways [4].

Before computing, and in some cases since, provenance was observed and recorded manually. Computing has since made possible automatic means of recording data, affording more minute access to events and more processing power to capture whole analytic processes. Like software and service development, data analytics can suffer waterfall approaches to understanding and attacking a problem. However, like development processes, data analytics has also benefitted from more iterative and reflexive approaches [4]. Analytic provenance can now be a means of improving a product or service, such as by using VisTrails [7]. This can be applied to data from both product use and to the gathering of that data itself. As computing has so far allowed, interaction can be automatically analyzed on a higher level by provenance-affording software. Software like Glass Box [8] captures, low-level events like mouse clicks and higher level events like operating system events and screenshots. Data like this is something that must be machine-readable first and then made interpretable by human analysts. These can then be manually annotated (i.e. immediately human-readable) by the analyst using Glass Box. Its visualizations, though, are very literal in both process and temporality. Both the over-the-shoulder and tabular review views display all events on a timeline. Perhaps this is influenced by the tendency of analysts to describe their sensemaking “strategies in terms of the visualization and interface, instead of the semantic meaning of their actions” as Lipford et al. [9] found in their studies to help users recall their reasoning process. Indeed, using a tool that afforded analytic provenance allowed the analysts to better recall and gain confidence in the methods they used to make sense of their data set.

By considering combinations and pluralizations of data captured and interpreted by analytic systems (see Appendix), such as a particular set of actions like dragging a set of virtual objects to a different space on the screen, visual analytics processes can be understood more semantically. This understanding can then perhaps be better visualized from an analytic provenance perspective as an entity in itself. For example, a set of actions a user normally takes to recover from an error or dead-end in an analytic process could be difficult to pick out automatically but may be obvious when a human watches a recording of the actions. Whereas semantic interaction with data can facilitate sensemaking by allowing the user to operate within an interface metaphor or idiom [10], the same interaction with the provenance of their insights could be understood just as well.

What could it mean that provenance data is semantically interactive? Rather than reviewing a list of system events as a video recording of the analyst’s workflow plays, what if the provenance analyst could chunk the recording and events into short recordings of particular tasks? Then those chunks could be visually represented as being connected to subsequent tasks in a state machine representation. This sort of abstract representation of analytic provenance, such as what VisTrail uses, could be a subject for further research, as it would attempt to model the user’s own mental model of the data and his intentions during sensemaking rather than just the actions expressed and captured. This could in turn influence new affordances and paradigms in provenance analysis tools.


[1] J. Heer and B. Shneiderman, “Interactive dynamics for visual analysis,” Queue, vol. 10, no. 2, p. 30, 2012.

[2] E. T. Brown, A. Ottley, H. Zhao, Q. Lin, R. Souvenir, A. Endert, and R. Chang, “Finding Waldo: Learning about Users from their Interactions,” 2014.

[3] L. Bradel, C. Andrews, A. Endert, K. Koch, K. Vogt, D. Hutchings, and C. North, “Large High Resolution Displays for Co-Located Collaborative Intelligence Analysis,” 2011.

[4] C. T. Silva, E. Anderson, E. Santos, and J. Freire, “Using VisTrails and Provenance for Teaching Scientific Visualization,” Computer Graphics Forum, vol. 30, no. 1, pp. 75–84, Mar. 2011.

[5] E. Lee, A. Gupta, D. Darvill, J. Dill, C. D. Shaw, and R. Woodbury, “The CZSaw notes case study,” in IS&T/SPIE Electronic Imaging, 2013, pp. 901706–901706.

[6] A. Cooper, “The myth of metaphor,” Visual Basic Programmer’s Journal, pp. 127–128, 1995.

[7] VisTrails: Using Provenance & Workflows for Scientific Exploration. Argonne Training Program on Extreme-Scale Computing, 2014.

[8] P. Cowley, L. Nowell, and J. Scholtz, “Glass box: An instrumented infrastructure for supporting human interaction with information,” in System Sciences, 2005. HICSS’05. Proceedings of the 38th Annual Hawaii International Conference on, 2005, p. 296c–296c.

[9] H. R. Lipford, F. Stukes, W. Dou, M. E. Hawkins, and R. Chang, “Helping users recall their reasoning process,” in Visual Analytics Science and Technology (VAST), 2010 IEEE Symposium on, 2010, pp. 187–194.

[10] A. Endert, P. Fiaux, and C. North, “Semantic interaction for visual text analytics,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2012, pp. 473–482..


An overview of the capture, interpretation, and visualization of interaction, organized as follows:

  • capture device
    • functional interpretation
      • visualization (metaphor, idiom)

The list:

  • mouse (operates on space)
    • presence
      • cursor
    • select
      • virtual object state change
    • hover
      • same as select, without persistence
    • scroll
      • displaying/selecting different parts of one object
    • drag
      • movement of virtual object
    • trace
      • movement of self, with persistence
  • touchscreen, pen/stylus (operates on space)
    • presence
      • touchpoint
    • select
      • virtual object state change
    • hold
      • same as select, persistent for duration of interaction
    • pinch
      • change field of view of space
      • change spatial property of virtual object
    • rotate
      • change orientation of space
      • change spatial property of virtual object
    • trace
      • movement of self or virtual object, with persistence
  • keyboard, game controller
    • single press
      • literal key meaning (print “f”,  “print screen”, “enter”, “exit”)
      • contextual/relative key meaning (“backspace” = “go back”)
    • buffer
      • single press -> literal, separate execution function
    • chain
      • same as buffer, but executed on last key entry
  • microphone (operates on system)
    • signal/noise detection
      • Amplitude, frequency, period, and phase
    • recognition
      • same as signal/noise detection, with selective filtering
  • joystick (operates on space)
    • same as mouse, but often operates on entire space at once
      • moves space around reticle
      • moves reticle around space, same as mouse interactions
  • accelerometer, gyroscope (operates on space or system)
    • same as joystick or mouse, but can be meaningfully used without visual context
      • two-dimensional, usually over time
  • camera (operates on system)
    • same as microphone, but visual
      • same as microphone, but more efficiently operates on space
  • passive sensors (operates on system)
    • same as microphone, but can be meaningfully used without constant feedback
      • same as microphone and accelerometer/gyroscope