Article published In: Reproducibility, Replicability, and Robustness in Corpus Linguistics
Edited by Martin Schweinberger and Michael Haugh
[International Journal of Corpus Linguistics 30:2] 2025
► pp. 195–233
Evaluating a transparent and interpretable approach to stance detection using linguistic markers in social media data
Available under the Creative Commons Attribution (CC BY) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
Open Access publication of this article was funded through a Transformative Agreement with University of Zurich.
Published online: 12 September 2025
https://doi.org/10.1075/ijcl.24132.rev
https://doi.org/10.1075/ijcl.24132.rev
Abstract
Our study focuses on replicability, which entails researchers’ ability to achieve similar results to a prior study
using identical methods but a different yet comparable dataset. We address the challenge of stance detection (determining whether
a document is “favorable,” “against,” or “neutral” toward a target), building on prior research underscoring the value of
linguistic markers as complementary features for sentiment detection that enable more accurate stance classification. We utilize
the Stance in Replies and Quotes (SRQ) dataset, which contains annotated discussion-based responses. Employing a rule-based
methodology that emphasizes linguistic features, we examine whether the classification accuracy remains within a similar error
margin as observed in a previous study of another dataset. Consistency is a necessary condition for robustness and
generalizability, ultimately enhancing trust in the methodology. The replication of the model and its adaptability to the new data
context demonstrate that it is competitive compared to existing machine learning studies.
Keywords: replicability, robustness, transferability, stance detection, social media
Article outline
- 1.Introduction: Reproducibility and robustness in stance detection
- 2.Theoretical and methodological background
- 2.1Replicability and related concepts
- 2.1.1Replicability: Same method, different data
- 2.1.2Reproducibility: Same data, same method
- 2.1.3Robustness: Same data, different method
- 2.2Replicability in stance detection
- 2.3Domain transferability as a robustness check
- 2.1Replicability and related concepts
- 3.Data and method
- 3.1The Stance in Replies and Quotes (SRQ) dataset
- 3.2The creation of custom dictionaries
- 3.3The replication of a classification process
- 4.Results
- 4.1Distribution and importance of features on training dataset
- 4.2Classification results
- 4.3Error analysis
- 4.4Results on the test dataset and examples
- 5.Discussion of the main findings and suggestions for improvements
- 6.Concluding remarks
- Notes
References
References (52)
Arnold, T., & Tilton, L. (2022). Wrappers
around Stanford CoreNLP tools (version 0.4–2) [R
package]. [URL]
Benoit, K. (2018). LIWCalike [R
package]. [URL]
Bettis, R. A., Helfat, C. E., & Shaver, J. M. (2016). The
necessity, logic, and forms of replication. Strategic Management
Journal, 37(11), 2193–2203.
Blitzer, J., Dredze, M., & Pereira, F. (2007). Biographies,
Bollywood, boom-boxes and blenders: Domain adaptation for sentiment
classification. In A. Zaenen & A. van den Bosch (Eds.) Proceedings
of the 45th annual meeting of the Association of Computational
Linguistics (pp. 440–447). Association for Computational Linguistics. [URL]
Buntain, C., & Golbeck, J. (2017). Automatically
identifying fake news in popular twitter threads. In 2017 IEEE
international conference on
SmartcCloud (pp. 208–215). Institute of Electrical and Electronic Engineers.
Coutellec, L. (2019). Ethics
and scientific integrity in biomedical research. debates on trust, robustness, and
relevance. Handbook of research ethics and scientific
integrity, 1–14. HAL Open Science.
Crible, L. (2022). Studying
discourse from corpus and experimental data: Bridging the methodological
gap. Discours, 301. [URL]
Dipper, S. (2008). Theory-driven
and corpus-driven computational linguistics, and the use of
corpora. In A. Lüdeling & M. Kytö (Eds.), Corpus
linguistics. An international
handbook (pp. 68–96). Mouton de Gruyter.
Ehret, K., & Taboada, M. (2020). The
interplay of complexity and subjectivity in opinionated discourse. Discourse
Studies, 23(2), 141–165.
Finkel, J. R., Grenager, T., & Manning, C. (2005). Incorporating
non-local information into information extraction systems by Gibbs
sampling. In K. Knight, H. T. Ng, & K. Oflazer (Eds.) Proceedings
of the 43rd annual meeting of the Association for Computational Linguistics
(ACL’05). Association for Computational Linguistics.
Funer, F. (2022). The
Deception of certainty: How non-interpretable machine learning outcomes challenge the epistemic authority of physicians. A
deliberative-relational approach. Medicine, Health Care and
Philosophy, 251, 167–178.
Ghosh, S., Singhania, P., Singh, S., Rudra, K., & Ghosh, S. (2019). Stance
detection in web and social media: a comparative study. In F. Crestani, M. Braschler, J. Savoy, A. Rauber, H. Müller, D. E. Losada, G. H. Bürki, L. Cappellato, & N. Ferro (Eds.) Experimental
IR meets multilinguality, multimodality, and interaction: 10th international conference of the Cross-Language Evaluation Forum
(CLEF)
Association. (pp. 75–87). Springer.
Gillings, M., Mautner, G., & Baker, P. (2023). Corpus-assisted
discourse studies. Cambridge University Press.
Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical
tests, P values, confidence intervals, and power: A guide to misinterpretations. European
Journal of
Epidemiology, 311, 337–350.
Grieve, J. (2021). Observation,
experimentation, and replication in
linguistics. Linguistics, 59(5), 1343–1356.
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text
as data: A new framework for machine learning and the social sciences. Princeton University Press.
Hartmann, J., Heitmann, M., Siebert, C., & Schamp, C. (2023). More
than a feeling: Accuracy and application of sentiment analysis. International Journal of
Research in
Marketing, 40(1), 75–87.
Joseph, K., Shugars, S., Gallagher, R., Green, J., Mathé, A. Q., An, Z., & Lazer, D. (2021). (Mis)Alignment
between stance expressed in social media data and public opinion
surveys. In M.-F. Moens, X. Huang, L. Specia, & S. W.-T. Yih (Eds.), Proceedings
of the 2021 conference on empirical methods in Natural Language
Processing (pp.312–324). Association for Computational Linguistics.
Kilgarriff, A. (2005). Language
is never, ever, ever, random. Corpus Linguistics and Linguistic
Theory, 1(2), 263–276.
Lamprecht, A.-L., Garcia, L., Kuzak, M., Martinez, C., Arcila, R., Martin Del Pico, E., Dominguez Del Angel, V., van de Sandt, S., Ison, J., Martinez, P. A., McQuilton, P., Valencia, A., Harrow, J., Psomopoulos, F., Gelpi, J. L., Chue Hong, N., Goble, C., & Capella-Gutierrez, S. (2020). Towards
FAIR principles for research software. Data
Science, 3(1), 37–59.
Levshina, N. (2015). How
to do linguistics with R: Data exploration and statistical analysis. John Benjamins.
Lewandowsky, S., & Oberauer, K. (2020). Low
replicability can support robust and efficient science. Nature
Communication, 111, 358.
Li, S., & Zong, C. (2008). Multi-domain
adaptation for sentiment classification: Using multiple classifier combining
methods. In Proceedings of the 2008 international conference on
Natural Language Processing and knowledge
engineering (pp. 1–8). Institute of Electrical and Electronic Engineers.
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., & McClosky, D. (2014). The
Stanford CoreNLP natural language processing toolkit. In K. Bontcheva & J. Zhu (Eds.), Proceedings
of the 52nd annual meeting of the association for computational linguistics: System
demonstrations (pp. 55–60).
Mohammad, S. M., & Turney, P. D. (2013). NRC
word-emotion association lexicon (version 0.92). National Research Council, Canada. [URL]
Munafò, M. R., & Smith, G. D. (2018). Robust
research needs many lines of
evidence. Nature, 5531, 399–401.
Ng, L. H. X., & Carley, K. M. (2022). Is
my stance the same as your stance? A cross validation study of stance detection
datasets. Information Processing &
Management, 59(6), 103070.
Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., Fidler, F., Hilgard, J., Struhl, M. K., Nuijten, M. B., Rohrer, J. M., Romero, F., Scheel, A. M., Scherer, L. D., Schönbrodt, F. D., & Vazire, S. (2022). Replicability,
robustness, and reproducibility in psychological science. Annual Review of
Psychology, 731, 719–748.
Pamungkas, E. W., Basile, V., & Patti, V. (2019). Stance
classification for rumour analysis in twitter: Exploiting affective information and conversation
structure. arXiv preprint, 1901.01911.
Peels, R., & Bouter, L. (2023). Replication
and trustworthiness. Accountability in
Research, 30(2), 77–87.
Pennebaker, J. W., Booth, R. J., & Francis, M. E. (2007). Linguistic
Inquiry and Word Count: LIWC [Computer software]. [URL]
Reveilhac, M., & Schneider, G. (2023). Replicable
semi-supervised approaches to state-of-the-art stance detection of tweets. Information
Processing &
Management, 60(2), 103199.
Schiller, B., Daxenberger, J., & Gurevych, I. (2021). Stance
detection benchmark: How robust is your stance detection? KI-Künstliche
Intelligenz, 35(3), 329–341.
Schneider, G., Hundt, M., & Oppliger, R. (2016). Part-of-speech
in historical corpora: Tagger evaluation and ensemble systems on
ARCHER. In S. Dipper, F. Neubarth, & H. Zinsmeister (Eds.), Proceedings
of the 13th conference on Natural Language Processing, KONVENS
2016 (pp. 256–264). Bochumer Linguistische Arbeitsberichte.
Schneider, G., & Lauber, M. (2019). Statistics
for linguists: A patient, slow-paced introduction to statistics and to the programming language
R. University of Zurich.
Schreiber-Gregory, D. (2018). Regulation
techniques for multicollinearity: Lasso, ridge, and elastic nets. Proceedings of Western users
of SAS software conferences 2018. [URL]
Schwab, S., Janiaud, P., Dayan, M., Amrhein, V., Panczak, R., Palagi, P. M., Hemkens, L. G., Ramon, M., Rothen, N., Senn, S., Furrer, E., & Held, L. (2022). Ten
simple rules for good research practice. PLOS Computational
Biology, 18(6), e1010139.
Sönning, L. and Werner, V. (2021). The
replication crisis, scientific revolutions, and
linguistics. Linguistics, 59(5), 1179–1206.
Taboada, M. (2016). Sentiment
analysis: An overview from linguistics. Annual Review of
Linguistics, 21, 325–347.
Tausczik, Y. R., & Pennebaker, J. W. (2010). The
psychological meaning of words: LIWC and computerized text analysis methods. Journal of
Language and Social
Psychology, 29(1), 24–54.
Tognini-Bonelli, E. (2001). Corpus
linguistics at work. John Benjamins.
Vajjala, S., & Balasubramaniam, R. (2022). What
do we really know about state of the art NER?. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, & S. Piperidis (Eds.), Proceedings
of the thirteenth Language Resources and Evaluation Conference
(LREC) (pp. 5983–5993). [URL]
Villa-Cox, R., Kumar, S., Babcock, M., & Carley, K. M. (2020). Stance
in Replies and Quotes (SRQ): A new dataset for learning stance in twitter conversations. arXiv
preprint, 2006.00691.
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzalez-Beltran, A., Gray, A. J. G., Groth, P., Goble, C., Grethe, J. S., Heringa, J., ’t Hoen, P. A. C., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J., & Mons, B. (2016). The
FAIR guiding principles for scientific data management and stewardship. Scientific
Data, 31, 160018.
Wilson, M. (2009). Quality
matters: Correctness, robustness and
reliability. Overload, 17(93). [URL]
Winter, B., & Grice, M. (2021). Independence
and generalizability in
linguistics. Linguistics, 59(5), 1251–1277.
