Chapter 6. Quantifying the interplay of gaze and gesture in deixis using an experimental-simulative approach

Pfeiffer, Thies; Renner, Patrick

doi:10.1075/ais.10.06pfe

In:Eye-tracking in Interaction: Studies on the role of eye gaze in dialogue
Edited by Geert Brône and Bert Oben
[Advances in Interaction Studies 10] 2018
► pp. 109–138

Get fulltext from our e-platform

Download Book PDF

Chapter 6
Quantifying the interplay of gaze and gesture in deixis using an experimental-simulative approach

Thies Pfeiffer

Patrick Renner

Published online: 13 November 2018

https://doi.org/10.1075/ais.10.06pfe

Abstract

Gaze and gestures have been studied qualitatively, e.g., by Kendon and others (Kendon, 1990; McNeill, 1992; Kendon, 2004; McNeill, 2006). A quantitative assessment of gaze and gestures in dialogue, in particular regarding precise orientations, positions and timings, however, has only been possible with the advent of advanced measuring technologies, such as motion capturing or eye tracking. Especially in dynamic natural environments, when interlocutors are concerned with their surrounding three-dimensional environment, a precise three-dimensional reconstruction of the set-up is required to analyze the produced multimodal utterances.

In this article we review several of our past projects with a focus on our experimental-simulative approach in which we combine state-of-the-art tracking technologies with 3D representations and computer simulations to test different hypotheses in the context of deixis in human-human interaction.

Keywords: gaze, speech, gestures, eye tracking, motion capturing, 3D simulation

Article outline

1.Introduction
2.An experimental-simulative approach
- 2.1Quantification using tracking technology
- 2.2Reconstructing a 3D situation model
- 2.3Simulations in the 3D situation model
- 2.4Analysis and results
- 2.5Summary
3.Example A: A study on deixis
- 3.1Background
- 3.2Scenario
- 3.3Tracking setup
- 3.4Reconstructed 3D situation model
- 3.5Simulations in the 3D situation model
- 3.6Results
  - 3.6.1Distribution of hand positions
  - 3.6.2Direction of pointing gestures
4.Example B: Deictic gaze of two interlocutors in a search scenario
- 4.1Background
- 4.2Scenario
- 4.3Method
  - Tracking
  - Reconstructed 3D situation model
  - Simulations in the 3D situation model
- 4.4Results
  - 4.4.1Automatic annotation of gaze targets
  - 4.4.2Identifying patterns of shared gaze
- 4.5Summary
5.Example C: Gaze and deixis in shared space of two interlocutors
- 5.1Background
- 5.2Scenario
  - Task
- 5.3Method
  - Tracking
  - Reconstructed 3D situation model
  - Simulations in the 3D situation model
- 5.4Results
- 5.5Summary
6.Conclusion
7.Outlook: Affordable tracking solutions
Acknowledgment
Notes
References

References (38)

References

Advanced Realtime Tracking GmbH (2016). ART Advanced Realtime Tracking Company Website. [URL], last checked October 2016.

Deak, G. O., Fasel, I., & Movellan, J. (2001). The emergence of shared attention: Using robots to test developmental theories. In Proceedings 1st International Workshop on Epigenetic Robotics: Lund University Cognitive Studies, volume 85.

Duchowski, A. (2007). Eye Tracking Methodology. Springer London, London.

Hobson, R. P. (2005). What puts the jointness into joint attention? Joint Attention: Communication and Other Minds: Issues in Philosophy and Psychology, page 185.

Holthaus, P., Pitsch, K., & Wachsmuth, S. (2011). How can I help? Spatial attention strategies for a receptionist robot. International Journal of Social Robots, 3:383–393.

Kassner, M. P. & Patera, W. R. (2012). PUPIL: constructing the space of visual attention. PhD thesis, Massachusetts Institute of Technology.

Kendon, A. (1990). Conducting interaction: Patterns of behavior in focused encounters, volume 7. CUP Archive.

(2004). Gesture: Visible action as utterance. Cambridge University Press, Cambridge, UK.

Kühnlein, P. & Stegmann, J. (2003). Empirical issues in deictic gesture: Referring to objects in simple identification tasks. Technical Report, SFB 360, Bielefeld University.

Kopp, S., Jung, B., Leßmann, N., & Wachsmuth, I. (2003). Max – a multimodal assistant in virtual reality construction. KI – Künstliche Intelligenz, 4(03):11–17.

Kranstedt, A., Lücking, A., Pfeiffer, T., Rieser, H., & Staudacher, M. (2006a). Measuring and Reconstructing Pointing in Visual Contexts. In Schlangen, D. & Fernandez, R. (Eds.), Proceedings of the Brandial 2006 – The 10th Workshop on the Semantics and Pragmatics of Dialogue, pages82–89, Potsdam. Universitätsverlag Potsdam.

Kranstedt, A., Lücking, A., Pfeiffer, T., Rieser, H., & Wachsmuth, I. (2006b). Deictic object reference in task-oriented dialogue. In Rickheit, G. & Wachsmuth, I. (Eds.), Situated Communication, pages155–207. Mouton de Gruyter: Berlin.

(2006c). Deixis: How to Determine Demonstrated Objects Using a Pointing Cone. In Gibet, S., Courty, N., & Kamp, J. -F. (Eds.), Gesture Workshop 2005, LNAI 3881, pages300–311, Berlin/Heidelberg: SpringerVerlag GmbH.

Landis, J. R. & Koch, G. G. (1977). The Measurement of Observer Agreement for Categorical Data. Biometrics, 33(1),159.

Lücking, A., Pfeiffer, T., & Rieser, H. (2015). Pointing and reference recon-sidered. Journal of Pragmatics, 77, 56–79.

McNeill, D. (1992). Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, Chicago.

(2006). Gesture, gaze, and ground. Lecture Notes in Computer Science, 3869:1.

Microsoft (2016). Kinect for Windows Website. WWW: [URL], last checked October 2016.

NaturalPoint, Inc. (2016). OptiTrack Motion Capture Systems Company Website. WWW: [URL], last checked October 2016.

Pfeiffer, T. (2010). Understanding Multimodal Deixis with Gaze and Gesture in Conversational Interfaces. Dissertation to acquire the doctor rerum naturalium, Bielefeld University, Bielefeld, Germany.

(2011). Interaction between Speech and Gesture: Strategies for Pointing to Distant Objects. In: E. Efthimiou & G. Kouroupetroglou (Eds.), Gestures in Embodied Communication and Human-Computer Interaction, 9th International Gesture Workshop, GW 2011, pages109–112, Athens: National and Kapodistrian University of Athens.

(2012). Using virtual reality technology in linguistic research. In Proceedings of the IEEE Virtual Reality 2012, pages83–84, Orange County, CA, USA. IEEE, IEEE.

(2013a). Documentation of gestures with data gloves. In C. Müller, A. Cienki, E. Fricke, S. Ladewig, D. McNeill & S. Teßendorf (Eds.), Handbücher zur Sprach- und Kommunikationswissenschaft / Hand- books of Linguistics and Communication Science, volume 1 of Handbooks of Linguistics and Communication Science (pp.868–879). Berlin: Mouton de Gruyter.

(2013b). Documentation of gestures with motion capture. In C. Müller, A. Cienki, E. Fricke, S. Ladewig, D. McNeill & S. Teßendorf (Eds.), Handbücher zur Sprach- und Kommunikationswissenschaft / Hand- books of Linguistics and Communication Science, volume 1 of Handbooks of Linguistics and Communication Science (pp.857–868). Berlin: Mouton de Gruyter.

Pfeiffer, T., Hofmann, F., Hahn, F., Rieser, H., & Röpke, I. (2013). Gesture semantics reconstruction based on motion capturing and complex event processing: a circular shape example. In M. Eskenazi, M. Strube, B. D. Eugenio & J. D. Williams (Eds.), Proceedings of the SIGDIAL 2013 Conference (pp.270–279). Metz: Association for Computational Linguistics.

Pfeiffer, T., Kranstedt, A., & Lücking, A. (2006). Sprach-Gestik Experimente mit IADE, dem Interactive Augmented Data Explorer. In S. Müller & G. Zachmann (Eds.), Dritter Workshop Virtuelle und Erweiterte Realität der GI-Fachgruppe VR/AR, pages61–72, Aachen: Shaker.

Pfeiffer, T. & Renner, P. (2014). EyeSee3D: A low-cost approach for analyzing mobile 3D eye-tracking data using computer vision and augmented reality technology. In Proceedings of the Symposium on Eye Tracking Research and Applications, ETRA ’14, pp.195–202, New York: ACM.

Pfeiffer, T., Renner, P., & Pfeiffer-Leßmann, N. (2016). EyeSee3D 2.0: Model-based Real-time Analysis of Mobile Eye-Tracking in Static and Dynamic Three-Dimensional Scenes. In Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications, pp.189–196. New York: ACM Press.

Renner, P., Pfeiffer, T., & Pfeiffer-Leßmann, N. (2015). Automatic analysis of a mobile dual eye-tracking study on joint attention. Abstracts of the 18th European Conference on Eye Movements, pages116–116.

Renner, P., Pfeiffer, T., & Wachsmuth, I. (2014). Spatial references with gaze and pointing in shared space of humans and robots. In C. Freksa, B. Nebel, M. Hegarty & T. Barkowsky (Eds.), Spatial Cognition IX: Volume 8684 of Lecture Notes in Computer Science (pp.121–136).

Rickheit, G. & Strohner, H. (1993). Grundlagen der kognitiven Sprachverarbeitung: Modelle, Methoden, Ergebnisse. Francke.

SensoMotoric Instruments GmbH (2016). SensoMotoric Instruments GmbH Company Website. WWW: [URL], last checked October 2016.

Tobii AB (2016). Tobii. WWW: [URL], last checked October 2016.

Tomasello, M., Carpenter, M., Call, J., Behne, T., & Moll, H. (2005). Understanding and sharing intentions: The origins of cultural cognition. Behavioral and brain sciences, 28(05), 675–691.

Van Dijk, T. A. & Kintsch, W. (1983). Strategies of discourse comprehension. Academic Press.

Vicon Motion Systems Ltd. (2016). VICON Company Website. WWW: [URL], last checked October 2016.

Viola, P. & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57(2), 137–154.

Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., & Sloetjes, H. (2006). ELAN: a professional framework for multimodality research. In Proceedings of LREC, volume 2006, page 5th.

Chapter 6Quantifying the interplay of gaze and gesture in deixis using an experimental-simulative approach

Chapter 6
Quantifying the interplay of gaze and gesture in deixis using an experimental-simulative approach