Exploratory models of human-AI teams: Leveraging human digital twins to investigate trust development

Nguyen, Daniel; Cohen, Myke C.; Kao, Hsien-Te; Engberson, Grant; Penafiel, Louis; Lynch, Spencer; McCormack, Robert; Cassani, Laura; Volkova, Svitlana

doi:10.1075/is.24052.ngu

Article published In: Multidisciplinary Perspectives on Human-AI Team Trust
Edited by Nicolo' Brandizzi, Morgan Elizabeth Bailey, Carolina Centeio Jorge, Myke C. Cohen, Francesco Frattolillo and Alan Richard Wagner
[Interaction Studies 26:2] 2025
► pp. 267–297

Get fulltext from our e-platform

Download EPUB

Exploratory models of human-AI teams

Leveraging human digital twins to investigate trust development

Published online: 27 February 2026

https://doi.org/10.1075/is.24052.ngu

Abstract

As human-agent teaming (HAT) research continues to grow, computational methods for modeling HAT behaviors and measuring HAT effectiveness also continue to develop. One rising method involves the use of human digital twins (HDT) to approximate human behaviors and socio-emotional-cognitive reactions to AI-driven agent team members. To help HDT research effectively model human trust in HATs, we offer two lines of insight. First, through a review of the HAT trust literature, we identify key characteristics and attributes of trust that must be considered in order to properly conceptualize, model, and measure trust. Through this review, we outline the theoretical foundations of trust needed for effective HDTs capable of emulating human trust and offer guidance on where and how extant HAT research should translate into HDT modeling and future research. Second, through causal analyses of archival team communication data from a HAT experiment, we supplement theoretical foundations for modeling trust with data-driven insights to guide the trust-related language HDTs may need to effectively emulate human trust. Finally, we discuss implications of these combined theoretical and empirical insights for future HDT research, highlighting the necessity of ongoing validation against human behaviors and the refinement of computational methods. This paper ultimately aims to advance both the fidelity and applicability of HDTs in modeling nuanced human-agent trust dynamics, fostering more effective and realistic human-agent collaborations.

Keywords: human-AI agent teaming (HAT), HAT trust modeling, human digital twins (HDT), causal discovery

Article outline

1.Introduction
2.Theoretical foundations of trust in HATs
- 2.1Defining and conceptualizing trust in HDTATs
- 2.2Critical characteristics of trust for modeling
  - Setting HDT initial trust in AI teammates
  - Changes in trust over time
  - Computational modeling of HDT trust over time
- 2.3Methods for measuring HAT trust
  - Self-reported trust
  - Behavioral trust
  - Emerging techniques for measuring trust
3.Empirical examination of HAT trust
- 3.1Data driven insights for modeling HAT trust
  - Measuring HAT trust using causal models
  - Causal analysis results: Empathy constructs
  - Causal analysis results: Socio-cognitive constructs
  - Causal analysis results: Emotional constructs
  - Causal analysis key takeaways
- 3.2Preliminary modeling of HDT trust
  - Preliminary HDT simulation limitations and implications
- 3.3Operational implications for defense applications
4.Conclusion
- Contributions to HAT science
- Future directions
References

References (77)

References

Abdurahman, S., Atari, M., Karimi-Malekabadi, F., Xue, M. J., Trager, J., Park, P. S., Golazizian, P., Omrani, A., & Dehghani, M. (2024). Perils and opportunities in using large language models in psychological research. PNAS nexus, 3(7), pgae245.

Abramov, G., Miellet, S., Kautz, J., Grenyer, B. F. S., & Deane, F. P. (2020). The paradoxical decline and growth of trust as a function of borderline personality disorder trait count: Using discontinuous growth modelling to examine trust dynamics in response to violation and repair. PloS One, 15(7), e0236170.

Baker, A. L., Phillips, E. K., Ullman, D., & Keebler, J. R. (2018). Toward an Understanding of Trust Repair in Human-Robot Interaction: Current Research and Future Directions. ACM Trans. Interact. Intell. Syst., 8(4), 301:1–30:30.

Barricelli, B. R., & Fogli, D. (2024). Digital Twins in Human-Computer Interaction: A Systematic Review. International Journal of Human-Computer Interaction, 40(2), 79–97.

Barsade, S. G. (2002). The Ripple Effect: Emotional Contagion and its Influence on Group Behavior. Administrative Science Quarterly, 47(4), 644–675.

Cai, W., Jin, Y., & Chen, L. (2022). Impacts of personal characteristics on user trust in conversational recommender systems. Proceedings of the 2022 CHI conference on human factors in computing systems, 1–14.

Chancey, E. T., Bliss, J. P., Yamani, Y., & Handley, H. A. H. (2017). Trust and the Compliance-Reliance Paradigm: The Effects of Risk, Error Bias, and Reliability on Trust and Dependence. Human Factors: The Journal of the Human Factors and Ergonomics Society, 59(3), 333–345.

Chiou, E. K., & Lee, J. D. (2023). Trusting Automation: Designing for Responsivity and Resilience. Human Factors: The Journal of the Human Factors and Ergonomics Society, 65(1), 137–165.

Cohen, M. C., Demir, M., Chiou, E. K., & Cooke, N. J. (2021). The Dynamics of Trust and Verbal Anthropomorphism in Human-Autonomy Teaming. 2021 IEEE 2nd International Conference on Human-Machine Systems (ICHMS), 1–6.

Cohen, M. C., Peel, M. A., Scalia, M. J., Willett, M. M., Chiou, E. K., Gorman, J. C., & Cooke, N. J. (2023). Anthropomorphism Moderates the Relationships of Dispositional, Perceptual, and Behavioral Trust in a Robot Teammate. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 671, 529–536.

Collins, A. L., Lawrence, S. A., Troth, A. C., & Jordan, P. J. (2013). Group affective tone: A review and future research directions. Journal of Organizational Behavior, 34(S1), S43–S62.

de Visser, E. J., Peeters, M. M. M., Jung, M. F., Kohn, S., Shaw, T. H., Pak, R., & Neerincx, M. A. (2020). Towards a Theory of Longitudinal Trust Calibration in Human-Robot Teams. International Journal of Social Robotics, 12(2), 459–478.

DeCastellarnau, A. (2018). A classification of response scale characteristics that affect data quality: A literature review. Quality & Quantity, 52(4), 1523–1559.

Demir, M., McNeese, N. J., Gorman, J. C., Cooke, N. J., Myers, C. W., & Grimm, D. A. (2021). Exploration of Teammate Trust and Interaction Dynamics in Human-Autonomy Teaming. IEEE Transactions on Human-Machine Systems, 51(6), 696–705.

Duan, W., McNeese, N., & Zhang, R. (2023). Communication in Human-AI Teaming. In Group Communication. Routledge.

Dzindolet, M. T., Peterson, S. A., Pomranky, R. A., Pierce, L. G., & Beck, H. P. (2003). The role of trust in automation reliance. International Journal of Human-Computer Studies, 58(6), 697–718.

Fan, C., Tariq, Z., Saadiq Bhuiyan, N., Yankoski, M. G., & Ford, T. W. (2024). Comp-husim: Persistent digital personality simulation platform. Adjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization, 98–101.

Forgas, J. P. (1995). Mood and judgment: The affect infusion model (AIM). Psychological Bulletin, 117(1), 39–66.

Garten, J., Boghrati, R., Hoover, J., Johnson, K. M., & Dehghani, M. (2016). Morality between the lines: Detecting moral sentiment in text. Proceedings of IJCAI 2016 Workshop on Computational Modeling of Attitudes.

Glenski, M., Ayton, E., Saldanha, E., Mendoza, J., Arendt, D., Shaw, Z., Cronk, K., Smith, S., & Greaves, M. (2021). Machine intelligence to detect, characterise, and defend against influence operations in the information environment. Journal of Information Warfare, 20(2), 42–66.

Glikson, E., & Woolley, A. W. (2020). Human Trust in Artificial Intelligence: Review of Empirical Research. Academy of Management Annals, 14(2), 627–660.

Graham, J., Haidt, J., Koleva, S., Motyl, M., Iyer, R., Wojcik, S. P., & Ditto, P. H. (2013). Moral foundations theory: The pragmatic validity of moral pluralism. In Advances in experimental social psychology (pp. 55–130, Vol. 471). Elsevier.

Hancock, P. A., Billings, D. R., Schaefer, K. E., Chen, J. Y. C., de Visser, E. J., & Parasuraman, R. (2011). A Meta-Analysis of Factors Affecting Trust in Human-Robot Interaction. Human Factors: The Journal of the Human Factors and Ergonomics Society, 53(5), 517–527.

Hanu, L., Thewlis, J., & Haco, S. (2021). How AI is learning to identify toxic online content. Scientific American, 81.

Hanu, L., & Unitary team. (2020). Detoxify.

Hoff, K. A., & Bashir, M. (2015). Trust in Automation: Integrating Empirical Evidence on Factors That Influence Trust. Human Factors, 57(3), 407–434.

Huang, L., Freeman, J., Cooke, N., John “JCR” Colonna-Romano, Wood, M., Buchanan, V., & Caufman, S. (2022). Artificial Social Intelligence for Successful Teams (ASIST) Study 31.

Jessup, S. A., Schneider, T. R., Alarcon, G. M., Ryan, T. J., & Capiola, A. (2019). The Measurement of the Propensity to Trust Automation. In J. Y. Chen & G. Fragomeni (Eds.), Virtual, Augmented and Mixed Reality. Applications and Case Studies (pp. 476–489). Springer International Publishing.

Jian, J.-Y., Bisantz, A. M., & Drury, C. G. (2000). Foundations for an Empirically Determined Scale of Trust in Automated Systems. International Journal of Cognitive Ergonomics, 4(1), 53–71.

Karpinsky, N. D., Chancey, E. T., Palmer, D. B., & Yamani, Y. (2018). Automation trust and attention allocation in multitasking workspace. Applied Ergonomics, 701, 194–201.

Kohn, S. C., de Visser, E. J., Wiese, E., Lee, Y.-C., & Shaw, T. H. (2021). Measurement of Trust in Automation: A Narrative Review and Reference Guide. Frontiers in Psychology, 121.

Lee, J. D., & See, K. A. (2004). Trust in Automation: Designing for Appropriate Reliance. Human Factors, 46(1), 50–80.

Lee, Y.-J., Lim, C.-G., & Choi, H.-J. (2022, October). Does GPT-3 Generate Empathetic Dialogues? A Novel In-Context Example Selection Method and Automatic Evaluation Metric for Empathetic Dialogue Generation. In N. Calzolari, C.-R. Huang, H. Kim, J. Pustejovsky, L. Wanner, K.-S. Choi, P.-M. Ryu, H.-H. Chen, L. Donatelli, H. Ji, S. Kurohashi, P. Paggio, N. Xue, S. Kim, Y. Hahm, Z. He, T. K. Lee, E. Santus, F. Bond, & S.-H. Na (Eds.), Proceedings of the 29th International Conference on Computational Linguistics (pp. 669–683). International Committee on Computational Linguistics.

Li, M., Erickson, I. M., Cross, E. V., & Lee, J. D. (2022). Estimating Trust in Conversational Agent with Lexical and Acoustic Features. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 66(1), 544–548.

(2024). It’s Not Only What You Say, But Also How You Say It: Machine Learning Approach to Estimate Trust from Conversation. Human Factors, 66(6), 1724–1741.

Lin, C.-P., He, H., Baruch, Y., & Ashforth, B. E. (2017). The Effect of Team Affective Tone on Team Performance: The Roles of Team Identification and Team Cooperation. Human Resource Management, 56(6), 931–952.

Luca, J., & Tarricone, P. (2001). Does emotional intelligence affect successful teamwork? Research Outputs Pre 2011.

Madhavan, P., & Wiegmann, D. A. (2007). Similarities and differences between human-human and human-automation trust: An integrative review. Theoretical Issues in Ergonomics Science, 8(4), 277–301.

Madsen, M., & Gregor, S. (2000). Measuring human-computer trust. 11th Australasian Conference on Information Systems, 531, 6–8.

Malle, B. F., & Ullman, D. (2021, January). Chapter 1 — A multidimensional conception and measure of human-robot trust. In C. S. Nam & J. B. Lyons (Eds.), Trust in Human-Robot Interaction (pp. 3–25). Academic Press.

Mayer, R. C., Davis, J. H., & Schoorman, F. D. (1995). An Integrative Model of Organizational Trust. The Academy of Management Review, 20(3), 709.

McDuff, D., Schaekermann, M., Tu, T., Palepu, A., Wang, A., Garrison, J., Singhal, K., Sharma, Y., Azizi, S., Kulkarni, K., et al. (2025). Towards accurate differential diagnosis with large language models. Nature, 1–7.

McNeese, N. J., Demir, M., Chiou, E. K., & Cooke, N. J. (2021). Trust and team performance in human-autonomy teaming. International Journal of Electronic Commerce, 25(1), 51–72.

Merritt, S. M., & Ilgen, D. R. (2008). Not all trust is created equal: Dispositional and history-based trust in human-automation interactions. Human Factors, 50(2), 194–210.

Meyer, J., & Lee, J. D. (2013). Trust, reliance, and compliance. In The Oxford handbook of cognitive engineering (pp. 109–124). Oxford University Press.

Miller, M. E., & Spatz, E. (2022). A unified view of a human digital twin. Human-Intelligent Systems Integration, 4(1), 23–33.

Mirzadeh, I., Alizadeh, K., Shahrokhi, H., Tuzel, O., Bengio, S., & Farajtabar, M. (2024, October). GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models.

National Academies of Sciences, Engineering, and Medicine. (2024, March). Foundational Research Gaps and Future Directions for Digital Twins. National Academies Press.

Niraula, D., Cuneo, K. C., Dinov, I. D., Gonzalez, B. D., Jamaluddin, J. B., Jin, J. J., Luo, Y., Matuszak, M. M., Ten Haken, R. K., Bryant, A. K., et al. (2025). Intricacies of human-ai interaction in dynamic decision-making for precision oncology. Nature Communications, 16(1), 1138.

Parasuraman, R., & Riley, V. (1997). Humans and Automation: Use, Misuse, Disuse, Abuse. Human Factors, 39(2), 230–253.

Park, J. S., O’Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023, August). Generative Agents: Interactive Simulacra of Human Behavior.

Patton, C. E., & Wickens, C. D. (2024). The relationship of trust and dependence. Ergonomics, 1–17.

QuantumBlack Labs. (2020). CausalNex: A python library for causal reasoning with bayesian networks [Version 0.x].

Rashkin, H., Choi, E., Jang, J. Y., Volkova, S., & Choi, Y. (2017, September). Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking. In M. Palmer, R. Hwa, & S. Riedel (Eds.), Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2931–2937). Association for Computational Linguistics.

Rashkin, H., Singh, S., & Choi, Y. (2016, August). Connotation Frames: A Data-Driven Investigation.

Razin, Y. S., & Feigh, K. M. (2023, March). Converging Measures and an Emergent Model: A Meta-Analysis of Human-Automation Trust Questionnaires.

Reinert, A., Rebensky, S., Osman, M. C., Prebot, B., Gonzalez, C., Morrison, D., Yerdon, V., & Nguyen, D. (2023). Using cognitive models to develop digital twin synthetic known user persona. Human Factors and Simulation, 83(83).

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2020, March). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter.

Sato, T., Yamani, Y., Liechty, M., & Chancey, E. T. (2020). Automation trust increases under high-workload multitasking scenarios involving risk. Cognition, Technology & Work, 22(2), 399–407.

Savani, B. (2024, May). DistilBERT for emotion recognition.

Schaefer, K. E., Chen, J. Y. C., Szalma, J. L., & Hancock, P. A. (2016). A Meta-Analysis of Factors Influencing the Development of Trust in Automation: Implications for Understanding Autonomy in Future Systems. Human Factors, 58(3), 377–400.

See, A., Roller, S., Kiela, D., & Weston, J. (2019). What makes a good conversation? how controllable attributes affect human judgments. arXiv preprint arXiv:1902.08654.

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2001). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.

Sharma, A., & Kiciman, E. (2020, November). DoWhy: An End-to-End Library for Causal Inference.

Shengli, W. (2021). Is human digital twin possible? Computer Methods and Programs in Biomedicine Update, 11, 100014.

Snow, T. (2021). From satisficing to artificing: The evolution of administrative decision-making in the age of the algorithm. Data & Policy, 31, e3.

Sumers, T., Yao, S., Narasimhan, K., & Griffiths, T. (2023). Cognitive architectures for language agents. Transactions on Machine Learning Research.

Textor, C., Zhang, R., Lopez, J., Schelble, B. G., McNeese, N. J., Freeman, G., Pak, R., Tossell, C., & de Visser, E. J. (2022). Exploring the Relationship Between Ethics and Trust in Human-Artificial Intelligence Teaming: A Mixed Methods Approach. Journal of Cognitive Engineering and Decision Making, 16(4), 252–281.

Tu, T., Schaekermann, M., Palepu, A., Saab, K., Freyberg, J., Tanno, R., Wang, A., Li, B., Amin, M., Cheng, Y., et al. (2025). Towards conversational diagnostic artificial intelligence. Nature, 1–9.

Volkova, S., Orvis, K., et al. (2024). Compound ai ecosystem: Agents and tools to improve training and learning. Proceedings of the Interservice/Industry Training, Simulation and Education Conference.

Widayati, C. C., Arijanto, A., Magita, M., & Septiana, D. (2022). The Effect of Emotional Intelligence, Teamwork, Organizational Culture and Empathy on Employee Performance. 4th Social and Humanities Research Symposium (SoRes 2021), 584–588.

Wildman, J. L., Nguyen, D., Thayer, A. L., Robbins-Roth, V. T., Carroll, M., Carmody, K., Ficke, C., Akib, M., & Addis, A. (2024). Trust in Human-Agent Teams: A Multilevel Perspective and Future Research Agenda. Organizational Psychology Review, 14(3), 373–402.

Yaghini, M., Liu, P., Boenisch, F., & Papernot, N. (2024, February). Regulation Games for Trustworthy Machine Learning.

Yang, X. J., Schemanske, C., & Searle, C. (2023). Toward Quantifying Trust Dynamics: How People Adjust Their Trust After Moment-to-Moment Interaction With Automation. Human Factors, 65(5), 862–878.

Zaharia, M., Khattab, O., Chen, L., Davis, J. Q., Miller, H., Potts, C., Zou, J., Carbin, M., Frankle, J., Rao, N., & Ghodsi, A. (2024). The shift from models to compound ai systems.

Zheng, X., Aragam, B., Ravikumar, P., & Xing, E. P. (2018). Dags with no tears: Continuous optimization for structure learning. Advances in Neural Information Processing Systems, 311, 9472–9483.

Zhou, X., Zhu, H., Mathur, L., Zhang, R., Qi, Z., Yu, H., Morency, L.-P., Bisk, Y., Fried, D., Neubig, G., & Sap, M. (2024). Sotopia: Interactive evaluation for social intelligence in language agents. International Conference on Learning Representations (ICLR). [URL]