Evaluating a transparent and interpretable approach to stance detection using linguistic markers in social media data

Reveilhac, Maud; Schneider, Gerold

doi:10.1075/ijcl.24132.rev

Article published In: Reproducibility, Replicability, and Robustness in Corpus Linguistics
Edited by Martin Schweinberger and Michael Haugh
[International Journal of Corpus Linguistics 30:2] 2025
► pp. 195–233

Get fulltext from our e-platform

Download EPUB

Evaluating a transparent and interpretable approach to stance detection using linguistic markers in social media data

Maud Reveilhac | LUT University

Gerold Schneider | University of Zurich

Available under the Creative Commons Attribution (CC BY) 4.0 license.

For any use beyond this license, please contact the publisher at rights@benjamins.nl.

Open Access publication of this article was funded through a Transformative Agreement with University of Zurich.

Published online: 12 September 2025

https://doi.org/10.1075/ijcl.24132.rev

Abstract

Our study focuses on replicability, which entails researchers’ ability to achieve similar results to a prior study using identical methods but a different yet comparable dataset. We address the challenge of stance detection (determining whether a document is “favorable,” “against,” or “neutral” toward a target), building on prior research underscoring the value of linguistic markers as complementary features for sentiment detection that enable more accurate stance classification. We utilize the Stance in Replies and Quotes (SRQ) dataset, which contains annotated discussion-based responses. Employing a rule-based methodology that emphasizes linguistic features, we examine whether the classification accuracy remains within a similar error margin as observed in a previous study of another dataset. Consistency is a necessary condition for robustness and generalizability, ultimately enhancing trust in the methodology. The replication of the model and its adaptability to the new data context demonstrate that it is competitive compared to existing machine learning studies.

Keywords: replicability, robustness, transferability, stance detection, social media

Article outline

1.Introduction: Reproducibility and robustness in stance detection
2.Theoretical and methodological background
- 2.1Replicability and related concepts
  - 2.1.1Replicability: Same method, different data
  - 2.1.2Reproducibility: Same data, same method
  - 2.1.3Robustness: Same data, different method
- 2.2Replicability in stance detection
- 2.3Domain transferability as a robustness check
3.Data and method
- 3.1The Stance in Replies and Quotes (SRQ) dataset
- 3.2The creation of custom dictionaries
- 3.3The replication of a classification process
4.Results
- 4.1Distribution and importance of features on training dataset
- 4.2Classification results
- 4.3Error analysis
- 4.4Results on the test dataset and examples
5.Discussion of the main findings and suggestions for improvements
6.Concluding remarks
Notes
References

References (52)

References

Arnold, T., & Tilton, L. (2022). Wrappers around Stanford CoreNLP tools (version 0.4–2) [R package]. [URL]

Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature, 5331, 452–454.

Benoit, K. (2018). LIWCalike [R package]. [URL]

Bettis, R. A., Helfat, C. E., & Shaver, J. M. (2016). The necessity, logic, and forms of replication. Strategic Management Journal, 37(11), 2193–2203.

Blitzer, J., Dredze, M., & Pereira, F. (2007). Biographies, Bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In A. Zaenen & A. van den Bosch (Eds.) Proceedings of the 45th annual meeting of the Association of Computational Linguistics (pp. 440–447). Association for Computational Linguistics. [URL]

Buntain, C., & Golbeck, J. (2017). Automatically identifying fake news in popular twitter threads. In 2017 IEEE international conference on SmartcCloud (pp. 208–215). Institute of Electrical and Electronic Engineers.

Coutellec, L. (2019). Ethics and scientific integrity in biomedical research. debates on trust, robustness, and relevance. Handbook of research ethics and scientific integrity, 1–14. HAL Open Science.

Crible, L. (2022). Studying discourse from corpus and experimental data: Bridging the methodological gap. Discours, 301. [URL]

Dipper, S. (2008). Theory-driven and corpus-driven computational linguistics, and the use of corpora. In A. Lüdeling & M. Kytö (Eds.), Corpus linguistics. An international handbook (pp. 68–96). Mouton de Gruyter.

Ehret, K., & Taboada, M. (2020). The interplay of complexity and subjectivity in opinionated discourse. Discourse Studies, 23(2), 141–165.

Finkel, J. R., Grenager, T., & Manning, C. (2005). Incorporating non-local information into information extraction systems by Gibbs sampling. In K. Knight, H. T. Ng, & K. Oflazer (Eds.) Proceedings of the 43rd annual meeting of the Association for Computational Linguistics (ACL’05). Association for Computational Linguistics.

Funer, F. (2022). The Deception of certainty: How non-interpretable machine learning outcomes challenge the epistemic authority of physicians. A deliberative-relational approach. Medicine, Health Care and Philosophy, 251, 167–178.

Ghosh, S., Singhania, P., Singh, S., Rudra, K., & Ghosh, S. (2019). Stance detection in web and social media: a comparative study. In F. Crestani, M. Braschler, J. Savoy, A. Rauber, H. Müller, D. E. Losada, G. H. Bürki, L. Cappellato, & N. Ferro (Eds.) Experimental IR meets multilinguality, multimodality, and interaction: 10th international conference of the Cross-Language Evaluation Forum (CLEF) Association. (pp. 75–87). Springer.

Gillings, M., Mautner, G., & Baker, P. (2023). Corpus-assisted discourse studies. Cambridge University Press.

Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 311, 337–350.

Grieve, J. (2021). Observation, experimentation, and replication in linguistics. Linguistics, 59(5), 1343–1356.

Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A new framework for machine learning and the social sciences. Princeton University Press.

Hartmann, J., Heitmann, M., Siebert, C., & Schamp, C. (2023). More than a feeling: Accuracy and application of sentiment analysis. International Journal of Research in Marketing, 40(1), 75–87.

Joseph, K., Shugars, S., Gallagher, R., Green, J., Mathé, A. Q., An, Z., & Lazer, D. (2021). (Mis)Alignment between stance expressed in social media data and public opinion surveys. In M.-F. Moens, X. Huang, L. Specia, & S. W.-T. Yih (Eds.), Proceedings of the 2021 conference on empirical methods in Natural Language Processing (pp.312–324). Association for Computational Linguistics.

Kilgarriff, A. (2005). Language is never, ever, ever, random. Corpus Linguistics and Linguistic Theory, 1(2), 263–276.

Küçük, D., & Can, F. (2020). Stance detection: A survey. ACM Computing Surveys (CSUR), 53(1), 1–37.

Lamprecht, A.-L., Garcia, L., Kuzak, M., Martinez, C., Arcila, R., Martin Del Pico, E., Dominguez Del Angel, V., van de Sandt, S., Ison, J., Martinez, P. A., McQuilton, P., Valencia, A., Harrow, J., Psomopoulos, F., Gelpi, J. L., Chue Hong, N., Goble, C., & Capella-Gutierrez, S. (2020). Towards FAIR principles for research software. Data Science, 3(1), 37–59.

Levshina, N. (2015). How to do linguistics with R: Data exploration and statistical analysis. John Benjamins.

Lewandowsky, S., & Oberauer, K. (2020). Low replicability can support robust and efficient science. Nature Communication, 111, 358.

Li, S., & Zong, C. (2008). Multi-domain adaptation for sentiment classification: Using multiple classifier combining methods. In Proceedings of the 2008 international conference on Natural Language Processing and knowledge engineering (pp. 1–8). Institute of Electrical and Electronic Engineers.

Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In K. Bontcheva & J. Zhu (Eds.), Proceedings of the 52nd annual meeting of the association for computational linguistics: System demonstrations (pp. 55–60).

Mohammad, S. M., & Turney, P. D. (2013). NRC word-emotion association lexicon (version 0.92). National Research Council, Canada. [URL]

Munafò, M. R., & Smith, G. D. (2018). Robust research needs many lines of evidence. Nature, 5531, 399–401.

Ng, L. H. X., & Carley, K. M. (2022). Is my stance the same as your stance? A cross validation study of stance detection datasets. Information Processing & Management, 59(6), 103070.

Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., Fidler, F., Hilgard, J., Struhl, M. K., Nuijten, M. B., Rohrer, J. M., Romero, F., Scheel, A. M., Scherer, L. D., Schönbrodt, F. D., & Vazire, S. (2022). Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology, 731, 719–748.

Pamungkas, E. W., Basile, V., & Patti, V. (2019). Stance classification for rumour analysis in twitter: Exploiting affective information and conversation structure. arXiv preprint, 1901.01911.

Peels, R., & Bouter, L. (2023). Replication and trustworthiness. Accountability in Research, 30(2), 77–87.

Pennebaker, J. W., Booth, R. J., & Francis, M. E. (2007). Linguistic Inquiry and Word Count: LIWC [Computer software]. [URL]

Popper, K. (1959). The Logic of scientific discovery. Hutchison.

Reveilhac, M., & Schneider, G. (2023). Replicable semi-supervised approaches to state-of-the-art stance detection of tweets. Information Processing & Management, 60(2), 103199.

Schiller, B., Daxenberger, J., & Gurevych, I. (2021). Stance detection benchmark: How robust is your stance detection? KI-Künstliche Intelligenz, 35(3), 329–341.

Schneider, G., Hundt, M., & Oppliger, R. (2016). Part-of-speech in historical corpora: Tagger evaluation and ensemble systems on ARCHER. In S. Dipper, F. Neubarth, & H. Zinsmeister (Eds.), Proceedings of the 13th conference on Natural Language Processing, KONVENS 2016 (pp. 256–264). Bochumer Linguistische Arbeitsberichte.

Schneider, G., & Lauber, M. (2019). Statistics for linguists: A patient, slow-paced introduction to statistics and to the programming language R. University of Zurich.

Schreiber-Gregory, D. (2018). Regulation techniques for multicollinearity: Lasso, ridge, and elastic nets. Proceedings of Western users of SAS software conferences 2018. [URL]

Schwab, S., Janiaud, P., Dayan, M., Amrhein, V., Panczak, R., Palagi, P. M., Hemkens, L. G., Ramon, M., Rothen, N., Senn, S., Furrer, E., & Held, L. (2022). Ten simple rules for good research practice. PLOS Computational Biology, 18(6), e1010139.

Sönning, L. and Werner, V. (2021). The replication crisis, scientific revolutions, and linguistics. Linguistics, 59(5), 1179–1206.

Taboada, M. (2016). Sentiment analysis: An overview from linguistics. Annual Review of Linguistics, 21, 325–347.

Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54.

Tognini-Bonelli, E. (2001). Corpus linguistics at work. John Benjamins.

Vajjala, S., & Balasubramaniam, R. (2022). What do we really know about state of the art NER?. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, & S. Piperidis (Eds.), Proceedings of the thirteenth Language Resources and Evaluation Conference (LREC) (pp. 5983–5993). [URL]

Villa-Cox, R., Kumar, S., Babcock, M., & Carley, K. M. (2020). Stance in Replies and Quotes (SRQ): A new dataset for learning stance in twitter conversations. arXiv preprint, 2006.00691.

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzalez-Beltran, A., Gray, A. J. G., Groth, P., Goble, C., Grethe, J. S., Heringa, J., ’t Hoen, P. A. C., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J., & Mons, B. (2016). The FAIR guiding principles for scientific data management and stewardship. Scientific Data, 31, 160018.

Wilson, M. (2009). Quality matters: Correctness, robustness and reliability. Overload, 17(93). [URL]

Winter, B. (2019). Statistics for linguists: An introduction using R. Routledge.

Winter, B., & Grice, M. (2021). Independence and generalizability in linguistics. Linguistics, 59(5), 1251–1277.

Young, L., & Soroka, S. (2012). Affective news: The automated coding of sentiment in political texts. Political Communication, 29(2), 205–231.

Zojaji, Z., & Tork Ladani, B. (2022). Adaptive cost-sensitive stance classification model for rumor detection in social networks. Social Networking Analysis and Mining, 121, 134.