Exploring trust in AI-supported military teams using sentiment analysis

Kucukosmanoglu, Murat; Johnson, Craig; Pollard, Kimberly; Chhan, David; Lakhmani, Shan; Forster, Daniel; Conklin, Sarah; Brooks, Justin; Crowell, H. Philip; Krausman, Andrea

doi:10.1075/is.24046.kuc

Article published In: Multidisciplinary Perspectives on Human-AI Team Trust
Edited by Nicolo' Brandizzi, Morgan Elizabeth Bailey, Carolina Centeio Jorge, Myke C. Cohen, Francesco Frattolillo and Alan Richard Wagner
[Interaction Studies 26:2] 2025
► pp. 229–266

Get fulltext from our e-platform

Download EPUB

Exploring trust in AI-supported military teams using sentiment analysis

Murat Kucukosmanoglu | D-Prime LLC

Craig Johnson | U.S. Army Combat Capabilities Development Command Army Research Laboratory

Kimberly Pollard | U.S. Army Combat Capabilities Development Command Army Research Laboratory

David Chhan | U.S. Army Combat Capabilities Development Command Army Research Laboratory

Shan Lakhmani | U.S. Army Combat Capabilities Development Command Army Research Laboratory

Daniel Forster | University of San Diego

Sarah Conklin | D-Prime LLC | University of Maryland, Baltimore County

Justin Brooks | D-Prime LLC | University of Maryland, Baltimore County

H. Philip Crowell | U.S. Army Combat Capabilities Development Command Army Research Laboratory

Andrea Krausman | U.S. Army Combat Capabilities Development Command Army Research Laboratory

Published online: 27 February 2026

https://doi.org/10.1075/is.24046.kuc

Abstract

Examining sentiment in team communications can provide information about trust among teammates. Natural language processing (NLP) models provide an efficient means of sentiment analysis. However, military teams and other professional teams use language that differs from what NLP models are trained on, leading to potentially inaccurate sentiment analysis. This study investigates the novel application of two advanced NLP models, DistilBERT and GPT-2, for sentiment analysis of expert military teams conducting AI-supported combat missions in a high fidelity simulation environment. Our fine-tuning process resulted in improved sentiment classification accuracy. The sentiment measures also correlated with measures of team trust and trust in the AI systems, providing valuable insight into the relationship between sentiment and trust in human-AI teaming scenarios. The generalized approach we describe may be useful for adapting sentiment analysis and NLP techniques to military teams, and may help measure trust dynamics and team states in human machine integrated teams.

Keywords: sentiment analysis, natural language processing, DistilBERT, GPT-2, human-AI teams, military communication, team trust

Article outline

1.Introduction
- 1.1Sentiment analysis in military teams
- 1.2Trust in human-machine integrated teams
- 1.3Communication sentiment and trust
- 1.4The current study
2.Methodology
- 2.1Dataset description
- 2.2Model development and tuning
- 2.3Transcription and cleaning
- 2.4Data segmentation and annotation
- 2.5Fine-tuning transformer models for sentiment analysis
- 2.6Models comparison of one vs two annotators
- 2.7Conversion to sentiment ratios
- 2.8Trust questionnaires
  - 2.8.1Team trust
  - 2.8.2AITR and DTAS trust
- 2.9Data analysis
  - 2.9.1Team trust
  - 2.9.2AITR trust
  - 2.9.3DTAS trust
  - 2.9.4Statistical analysis
3.Results
- 3.1Analysis of team trust levels
- 3.2Analysis of adaptive aided target recognition (AITR) trust
- 3.3Analysis of dynamic task allocation software (DTAS) trust levels
- 3.4Analysis of the relationship between team trust and AITR/DTAS trust
- 3.5Qualitative examination
4.Discussion
- 4.1Model tuning for the military context
- 4.2Sentiment and trust
  - 4.2.1Trust in teammates
  - 4.2.2Trust in DTAS and AiTR
- 4.3Limitations and future work
5.Conclusion
Acknowledgements
References

References (68)

References

Abdullah, M., Madain, A., & Jararweh, Y. (2022). Chatgpt: Fundamentals, applications and social impacts. 2022 Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS), 1–8.

Alarcon, G. M., Lyons, J. B., Hamdan, I. A., & Jessup, S. A. (2024). Affective responses to trust violations in a human-autonomy teaming context: Humans versus robots. International Journal of Social Robotics, 16(1), 23–35.

Alghanmi, I., Anke, L. E., & Schockaert, S. (2020). Combining bert with static word embeddings for categorizing social media. Proceedings of the sixth workshop on noisy user-generated text (w-nut 2020), 28–33.

Attota, D. C., & Dehbozorgi, N. (2022). Towards application of speech analysis in predicting learners’ performance. 2022 IEEE Frontiers in Education Conference (FIE), 1–5.

Baker, A. L., Fitzhugh, S. M., Huang, L., Forster, D. E., Scharine, A., Neubauer, C., Lematta, G., Bhatti, S., Johnson, C. J., Krausman, A., et al. (2021). Approaches for assessing communication in human-autonomy teams. Human-Intelligent Systems Integration, 3(2), 99–128.

Beigi, G., Tang, J., Wang, S., & Liu, H. (2016). Exploiting emotional information for trust/distrust prediction. Proceedings of the 2016 SIAM international conference on data mining, 81–89.

Bonta, V., Kumaresh, N., & Janardhan, N. (2019). A comprehensive study on lexicon based approaches for sentiment analysis. Asian Journal of Computer Science and Technology, 8(S2), 1–6.

Bose, R., Dey, R. K., Roy, S., & Sarddar, D. (2020). Sentiment analysis on online product reviews. Information and Communication Technology for Sustainable Development: Proceedings of ICT4SD 2018, 559–569.

Bray, R. M. (2009). Department of defense survey of health related behaviors among active duty military personnel: A component of the defense lifestyle assessment program. Diane Publishing.

Buçinca, Z., Malaya, M. B., & Gajos, K. Z. (2021). To trust or to think: Cognitive forcing functions can reduce overreliance on ai in ai-assisted decision-making. Proceedings of the ACM on Human-computer Interaction, 5(CSCW1), 1–21.

Chen, L.-C., Lee, C.-M., & Chen, M.-Y. (2020). Exploration of social media for sentiment analysis using deep learning. Soft Computing, 24(11), 8187–8197.

Chiou, E. K., & Lee, J. D. (2023). Trusting automation: Designing for responsivity and resilience. Human factors, 65(1), 137–165.

Cohen, M. C., Demir, M., Chiou, E. K., & Cooke, N. J. (2021). The dynamics of trust and verbal anthropomorphism in human-autonomy teaming. 2021 IEEE 2nd international conference on human-machine systems (ICHMS), 1–6.

Cooke, N. J., & Gorman, J. C. (2009). Interaction-based measures of cognitive systems. Journal of cognitive engineering and decision making, 3(1), 27–46.

Corbin, L., Griner, E., Seyedi, S., Jiang, Z., Roberts, K., Boazak, M., Rad, A. B., Clifford, G. D., & Cotes, R. O. (2023). A comparison of linguistic patterns between individuals with current major depressive disorder, past major depressive disorder, and controls in a virtual, psychiatric research interview. Journal of Affective Disorders Reports, 141, 100645.

Costa, A. C., & Anderson, N. (2011). Measuring trust in teams: Development and validation of a multifaceted measure of formative and reflective indicators of team trust. European Journal of Work and Organizational Psychology, 20(1), 119–154.

Costa, A. C., Fulmer, C. A., & Anderson, N. R. (2018). Trust in work teams: An integrative review, multilevel model, and future directions. Journal of organizational behavior, 39(2), 169–184.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. psychometrika, 16(3), 297–334.

Dashtipour, K., Gogate, M., Adeel, A., Larijani, H., & Hussain, A. (2021). Sentiment analysis of persian movie reviews using deep learning. Entropy, 23(5), 596.

De Visser, E. J., Pak, R., & Shaw, T. H. (2018). From ‘automation’to ‘autonomy’: The importance of trust repair in human-machine interaction. Ergonomics, 61(10), 1409–1427.

De Visser, E. J., Peeters, M. M., Jung, M. F., Kohn, S., Shaw, T. H., Pak, R., & Neerincx, M. A. (2020). Towards a theory of longitudinal trust calibration in human-robot teams. International journal of social robotics, 12(2), 459–478.

DeChurch, L. A., & Mesmer-Magnus, J. R. (2010). The cognitive underpinnings of effective teamwork: A meta-analysis. Journal of applied psychology, 95(1), 32.

Devlin, J. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Dunn, J. R., & Schweitzer, M. E. (2005). Feeling and believing: The influence of emotion on trust. Journal of personality and social psychology, 88(5), 736.

Endsley, M. R. (2017). From here to autonomy: Lessons learned from human-automation research. Human factors, 59(1), 5–27.

Feitosa, J., Grossman, R., Kramer, W. S., & Salas, E. (2020). Measuring team trust: A critical and meta-analytical review. Journal of Organizational Behavior, 41(5), 479–501.

Flood, A., & Keegan, R. J. (2022). Cognitive resilience to psychological stress in military personnel. Frontiers in psychology, 131, 809003.

Ghafari, S. M., Beheshti, A., Joshi, A., Paris, C., Yakhchi, S., Jolfaei, A., & Orgun, M. A. (2020). A dynamic deep trust prediction approach for online social networks. Proceedings of the 18th international conference on advances in mobile computing & multimedia, 11–19.

Glikson, E., & Woolley, A. W. (2020). Human trust in artificial intelligence: Review of empirical research. Academy of Management Annals, 14(2), 627–660.

Gremillion, G. M., Rexwinkle, J. T., Cox, K. R., Brooks, J. R., Dyer, P., Kucukosmanoglu, M., Giammanco, C. A., Hung, C. P., Napier, S. J., Carter, E. C., Marusich, L. R., Rohaly, T. R., Krausman, A. S., & Perelman, B. S. (2024). Technologies to cue and support team tasking and coordination in the next generation combat vehicle (summary technical report) (tech. rep. No. ARL-TR-9963). U.S. Army DEVCOM Army Research Laboratory. Aberdeen Proving Ground, MD.

Gupta, S., Modgil, S., Bhattacharyya, S., & Bose, I. (2022). Artificial intelligence for decision support systems in the field of operations research: Review and future scope of research. Annals of Operations Research, 308(1), 215–274.

Hancock, P. A., Billings, D. R., Schaefer, K. E., Chen, J. Y., De Visser, E. J., & Parasuraman, R. (2011). A meta-analysis of factors affecting trust in human-robot interaction. Human factors, 53(5), 517–527.

Hildebrand, C., & Bergner, A. (2021). Conversational robo advisors as surrogates of trust: Onboarding experience, firm perception, and consumer financial decision making. Journal of the Academy of Marketing Science, 49(4), 659–676.

Hoff, K. A., & Bashir, M. (2015). Trust in automation: Integrating empirical evidence on factors that influence trust. Human factors, 57(3), 407–434.

Huang, L., Cooke, N. J., Gutzwiller, R. S., Berman, S., Chiou, E. K., Demir, M., & Zhang, W. (2021). Distributed dynamic team trust in human, artificial intelligence, and robot teaming. In Trust in human-robot interaction (pp. 301–319). Elsevier.

Hutto, C., & Gilbert, E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of the international AAAI conference on web and social media, 8(1), 216–225.

Jean-Baptiste, C. O., Herring, R. P., Beeson, W. L., Dos Santos, H., & Banta, J. E. (2020). Stressful life events and social capital during the early phase of covid-19 in the us. Social Sciences & Humanities Open, 2(1), 100057.

Johnson, C. J., Demir, M., McNeese, N. J., Gorman, J. C., Wolff, A. T., & Cooke, N. J. (2023). The impact of training on human-autonomy team communications and trust calibration. Human factors, 65(7), 1554–1570.

Khawaji, A., Chen, F., Marcus, N., & Zhou, J. (2013). Trust and cooperation in textbased computer-mediated communication. Proceedings of the 25th Australian Computer-Human Interaction Conference: Augmentation, Application, Innovation, Collaboration, 37–40.

Lee, J. D., & See, K. A. (2004). Trust in automation: Designing for appropriate reliance. Human factors, 46(1), 50–80.

Li, M., Erickson, I. M., Cross, E. V., & Lee, J. D. (2024). It’s not only what you say, but also how you say it: Machine learning approach to estimate trust from conversation. Human Factors, 66(6), 1724–1741.

Liu, Y. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.

Loper, E., & Bird, S. (2002). Nltk: The natural language toolkit. arXiv preprint cs/0205028.

Lottridge, D., Chignell, M., & Jovicic, A. (2011). Affective interaction: Understanding, evaluating, and designing for human emotion. Reviews of Human Factors and Ergonomics, 7(1), 197–217.

Madhavan, P., & Wiegmann, D. A. (2007). Similarities and differences between humanhuman and human-automation trust: An integrative review. Theoretical Issues in Ergonomics Science, 8(4), 277–301.

Mathieu, J. E., Heffner, T. S., Goodwin, G. F., Salas, E., & Cannon-Bowers, J. A. (2000). The influence of shared mental models on team process and performance. Journal of applied psychology, 85(2), 273.

Mayer, R. (1995). An integrative model of organizational trust. Academy of Management Review.

McKinney, W., et al. (2011). Pandas: A foundational python library for data analysis and statistics. Python for high performance and scientific computing, 14(9), 1–9.

Muir, B. M., & Moray, N. (1996). Trust in automation. part ii. experimental studies of trust and human intervention in a process control simulation. Ergonomics, 39(3), 429–460.

Nguyen-Mau, T., Le, A.-C., Pham, D.-H., & Huynh, V.-N. (2024). An information fusion based approach to context-based fine-tuning of gpt models. Information Fusion, 1041, 102202.

Norman, S. M., Avolio, B. J., & Luthans, F. (2010). The impact of positivity and transparency on trust in leaders and their perceived effectiveness. The leadership quarterly, 21(3), 350–364.

Parasuraman, R., & Riley, V. (1997). Humans and automation: Use, misuse, disuse, abuse. Human factors, 39(2), 230–253.

Philander, K., & Zhong, Y. (2016). Twitter sentiment analysis: Capturing sentiment from integrated resort tweets. International Journal of Hospitality Management, 551, 16–24.

Pressman, S. D., & Cohen, S. (2012). Positive emotion word use and longevity in famous deceased psychologists. Health Psychology, 31(3), 297.

Radford, A. (2018). Improving language understanding by generative pre-training.

Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023). Robust speech recognition via large-scale weak supervision. International conference on machine learning, 28492–28518.

Rexwinkle, J. T., Gremillion, G. M., Krausman, A. S., Cox, K. R., Brewer, R. W., Giammanco, C. A., Chhan, D., Metcalfe, J. S., Marusich-Cooper, L., Wright, J. L., Holder, E. W., Cesar-Tondreau, B., Smith, T. B., Pollard, K. A., Neubauer, C. E., Lakhmani, S. G., Scharine, A. A., Fitzhugh, S. M., Forster, D. E., . . . Conklin, S. (2024). Adaptive situation awareness technologies for next generation combat platforms (Technical Report). DEVCOM Army Research Laboratory; DCS Corp; FIBERTEK; Arizona State University and D-Prime LLC.

Sanh, V. (2019). Distilbert, a distilled version of bert: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.

Schoorman, F. D., Mayer, R. C., & Davis, J. H. (2007). An integrative model of organizational trust: Past, present, and future.

Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and statistical modeling with python. SciPy, 7(1).

Tao, X., Dharmalingam, R., Zhang, J., Zhou, X., Li, L., & Gururajan, R. (2019). Twitter analysis for depression on social networks based on sentiment and stress. 2019 6th International Conference on Behavioral, Economic and Socio-Cultural Computing (BESC), 1–4.

Thielmann, I., & Hilbig, B. E. (2015). Trust: An integrative review from a person-situation perspective. Review of General Psychology, 19(3), 249–277.

van Rhenen, J.-W., Centeio Jorge, C., Matej Hrkalovic, T., & Dudzik, B. (2022). Effects of social behaviours in online video games on team trust. Extended Abstracts of the 2022 Annual Symposium on Computer-Human Interaction in Play, 159–165.

Van Rossum, G., & Drake, F. L. (2009). Introduction to python 3: Python documentation manual part 1. CreateSpace.

Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., et al. (2020). Scipy 1.0: Fundamental algorithms for scientific computing in python. Nature methods, 17(3), 261–272.

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., et al. (2020). Transformers: State-of-the-art natural language processing. Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, 38–45.

Yang, X., Aurisicchio, M., & Baxter, W. (2019). Understanding affective experiences with conversational agents. proceedings of the 2019 CHI conference on human factors in computing systems, 1–12.

Yang, Z. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237.