Classifying literary genres: a methodological synergy of computational modelling and lexical semantics / Classificação de gêneros literários: uma sinergia metodológica de modelagem computacional e semântica

Abdulfattah Omar

Resumo


ABSTRACT: Classifying literary genres has always been methodologically confined to philological methods and what is commonly known as Vector Space Clustering (VSC). The problem has been exasperated with the widening gap between computational theory and traditional analysis of literary texts. Towards finding a solution to this problem, the current study utilizes a synergetic approach that brings together two established methods. First, a computational model of genre classification is drawn upon for identifying concept-based, rather than word-bound, topics, where the representation of texts is secured via the ‘bag of concepts’ (BOC) model as well as the sense-restricted knowledge and meaningful links holding between and among concepts; relatedly, the two model strands of explicit semantic analysis (ESA) and ConceptNet have enacted text classification. Second, a contextual lexical semantic approach (CRUSE, 1986, 2000) is employed so that the contextual variability of word meanings and concepts can be tackled within the confines of the target literary genres classified. The findings of present study have shown that the current composite approach of computational and semantic models has resulted in improved performance in classifying literary genres, especially with respect to delineating the links between each cluster’s document-members and generalizing about their unifying genre. Further implications have emerged from the present study, namely, the benefits reserved for digital libraries and the process of archiving, where literary-text classification has proven problematic to both users and readers in many cases.

KEYWORDS: bag of concepts (BOC); ConceptNet; Explicit Semantic Analysis (ESA); genre classification; topic concepts; Vector Space clustering (VSC).

 

RESUMO: A classificação de gêneros literários sempre se restringiu metodologicamente aos métodos filológicos e ao que é comumente conhecido como Vector Space Clustering (VSC). O problema foi exasperado com a crescente lacuna entre a teoria computacional e a análise tradicional de textos literários. Para encontrar uma solução para esse problema, o presente estudo utiliza uma abordagem sinérgica que reúne dois métodos estabelecidos. Primeiro, um modelo computacional de classificação de gênero é utilizado para identificar tópicos baseados em conceito, em vez de vinculados a palavras, em que a representação de textos é protegida por meio do modelo “bolsa de conceitos” (BOC), bem como o conhecimento restrito aos sentidos e os vínculos significativos entre os conceitos; De maneira semelhante, os dois modelos de análise semântica explícita (ASE) e ConceptNet promulgaram a classificação do texto. Segundo, uma abordagem semântica lexical contextual (CRUSE, 1986, 2000) é empregada para que a variabilidade contextual dos significados e conceitos das palavras possa ser abordada dentro dos limites dos gêneros literários alvo classificados. As descobertas do presente estudo mostraram que a atual abordagem composta de modelos computacionais e semânticos resultou em melhor desempenho na classificação de gêneros literários, especialmente no que diz respeito a delinear os vínculos entre os membros do documento de cada grupo e generalizar sobre seu gênero unificador. Outras implicações emergiram do presente estudo, a saber, os benefícios reservados para as bibliotecas digitais e o processo de arquivamento, em que a classificação de textos literários se mostrou problemática para usuários e leitores em muitos casos.

 

PALAVRAS-CHAVE: bolsa de conceitos (COB); ConceptNet; Análise Semântica Explícita (ASE); classificação de gênero; conceitos de tópicos; Vector Space Clustering (VSC).


Palavras-chave


bag of concepts (BOC); ConceptNet; Explicit Semantic Analysis (ESA); genre classification; topic concepts; Vector Space clustering (VSC).

Texto completo:

PDF (English)

Referências


ADOLPHS, S.; KNIGHT, D. The Routledge Handbook of English Language and Digital Humanities: Taylor & Francis, 2020.

BAASNER, R.; ZENS, M. Methoden und Modelle der Literaturwissenschaft: Eine Einführung (Methods and Models of Literary Studies: An Introduction (Revised Edition). Berlin: Erich Schmidt, 2005.

BAYM, N. The Norton Anthology of American Literature (7th ed.). New York; London: W.W. Norton & Co., 2007.

BELLEGARDA, J. Latent Semantic Mapping: Principles And Applications. Morgan & Claypool Publishers, 2008.

BENDIXEN, A. A Companion to the American Novel. 1st ed.. Chichester, West Sussex: Wiley-Blackwell, 2012.

BERRY, D. M. Understanding Digital Humanities. Basingstoke: Palgrave Macmillan, 2012.

BHATIA, V. K. Analysing Genre : Language Use in Professional Settings. London: Longman, 2014.

BIBER, D. Spoken and Written Textual Dimensions in English: Resolving the Contradictory Findings. Language, 62(2), p. 384-413, 1986.

BLOTNER, J. The Modern American Political Novel, 1900-1960. Austin: University of Texas Press, 1966.

BLOTNER, J. The Political Novel. Norwood Editions, 1977.

BOELHOWER, W. Q. The Immigrant Novel as Genre, MELUS, 8(1), p. 3-13, 1981.

CARISSIMO, J. Yi-Fen Chou: White author under fire after using Asian pen name to be published more often. The Independent, 8 September 2015.

CASSUTO, L.; REISS, B. The Cambridge History of the American Novel. Cambridge: Cambridge University Press, 2011.

CHAKRABORTY, G.; PAGOLU, M. Text Mining and Analysis: Practical Methods, Examples, and Case Studies. SAS Institute, 2014.

CHAKRABORTY, G.; PAGOLU, M.; GARLA, S. Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS. Cary, North Caroline: SAS Institute, 2014.

CHESTER, D. The Fantasy Fiction Formula. Oxford University Press, 2016.

CLAYBAUGH, A. The Novel of Purpose: Literature and Social Reform in the Anglo-American World. Cornell University Press, 2018.

COULSON, S. Semantic Leaps: Frame-shifting and Conceptual Blending in Meaning Construction. Cambridge: Cambridge University Press, 2006.

CRUSE, D. A. Lexical Semantics. Cambridge: Cambridge University Press, 1986.

CRUSE, D. A. Meaning in Language: An Introduction to Semantics and Pragmatics. Oxford: Oxford University Press, 2000.

DOUGLAS, D. The multi-dimensional approach to linguistic analyses of genre variation: An overview of methodology and findings. Computers and the Humanities, 26(5-6), p. 331-345, 1992.

DUNN, J.; ARGAMON, S.; RASOOLI, A.; KUMAR, G. Profile-based authorship analysis. Digital Scholarship Humanities, 31(4), p. 689-710, 2016.

ERLICH, V. Russian Formalism: History- Doctrine. New Haven; London: Yale University Press, 3rd ed., 1981.

FANG, L.; MEHLITZ, M.; LI, F.; SHENG, H. Web Pages Clustering and Concepts Mining: An approach towards Intelligent Information Retrieval. Cybernetics and Intelligent Systems, IEEE Conference, 2006, p. 1-6.

FLOOD, A. White poet used Chinese pen name to gain entry into Best American Poetry. The Guardian, 8 September 2015.

FOWLER, A. Kinds of Literature: an introduction to the theory of genres and modes. Harvard Univ Press, 1982.

FRANCO, D. J. Ethnic American Literature: Comparing Chicano. Jewish, and African American Writing. Charlottesville; London: University of Virginia Press, 2006.

GABRILOVICH, E.; MARKOVITCH, S. Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge. Proceedings of the Twenty-First National Conference on Artificial Intelligence, 2006, p. 1301-1306.

GABRILOVICH, E.; MARKOVITCH, S. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. Proceedings of the 20th International Joint Conference on Artificial Intelligence, 2007, p. 1606-1611.

GELBUKH, A. Computational Linguistics and Intelligent Text Processing: Springer, 2007.

GOMAA, D. The Non-National in Contemporary American Literature: Ethnic Women Writers and Problematic Belongings. Palgrave Macmillan, 2016.

GRICE, H. Beginning Ethnic American Literatures. Manchester: Manchester University Press, 2001.

GRIFFITHS, T. L.; STEYVERS, M. Topics in Semantic Representation. Psychological Review, 114(2), p. 211-244, 2007.

HAMMOND, A.; BROOKE, J. A Tale of Two Cultures: Bringing Literary Analysis and Computational Linguistics Together. Proceedings of the Second Workshop on Computational Linguistics for Literature, Atlanta, Georgia, June 14, 2013.

HAN, J.; KAMBER, M. Data mining: concepts and techniques. San Francisco, Calif.; London: Morgan Kaufmann, 2001.

HAVASI, C.; SPEER, R.; ALONSO, J. Conceptnet 3: A Flexible Multilingual Semantic Network for Common Sense Knowledge. Recent Advances in Natural Language Processing, 2007, p. 27-29.

HOLMES, D. I. The Evolution of Stylometry in Humanities Scholarship. Lit Linguist Computing, 13(3), p. 111-117, 1998. doi:10.1093/llc/13.3.111

HOWE, I. Politics and the novel. New York; London: Columbia University Press, 1992.

JOACHIMS, T. Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers, 2002.

JOCKERS, M. L. Machine-Classifying Novels and Plays by Genre. Retrieved from: https://www.stanford.edu/~mjockers/cgi-bin/drupal/node/27, 13 February 2009.

KAPLAN, A. The Social Construction of American Realism. Chicago: University of Chicago Press, 1992.

KARCZ, A. The Polish Formalist School and Russian Formalism. Rochester, N.Y. ; Woodbridge: University of Rochester Press, 2002.

KESSLER, B.; NUMBERG, G.; SCHTZE, H. Automatic Detection of Text Genre. Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics. Madrid, Spain, 1997.

KOPPEL, M.; ARGAMON, S.; SHIMONI, A. R. Automatically Categorizing Written Texts by Author Gender. Lit Linguist Computing, 17(4), p. 401-412, 2002. doi:10.1093/llc/17.4.401.

LIAO, S.-H.; CHU, P.-H.; HSIAO, P.-Y. Data Mining Techniques and Applications – A Decade Review from 2000 to 2011. Expert Systems with Applications, 39(12), p. 11303-11311, 2012.

LIU, H.; SINGH, P. ConceptNet- A Practical Commonsense Reasoning Tool- Kit. BT TEchnology Journal, 22(4), p. 211-226, 2004.

LOOKS, M.; LEVINE, A.; COVINGTON, G. A.; LOUI, R. P. A. L. R. P.; LOCKWOOD, J. W. A. L. J. W.; CHO, Y. H. A. Streaming Hierarchical Clustering for Concept Mining. Aerospace Conference, 2007 IEEE.

MAJKIĆ, Z. Big Data Integration Theory: Theory and Methods of Database Mappings. Programming Languages, and Semantics, Springer, 2014.

MANDELKER, A. Russian Formalism and the Objective Analysis of Sound in Poetry. The Slavic and East European Journal, 27(3), p. 327-338, 1983.

MANNING, C. D.; RAGHAVAN, P.; SCHÜTZE, H. An Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008.

MICKENBERG, J.; VALLONE, L. The Oxford Handbook of Children's Literature. Oxford University Press, 2012.

NELSON, E. S. Ethnic American Literature. California; Oxford: Greenwood Press, 2015.

OLMOS, R., LEÓN, J. A.; JORGE-BOTANA, G.; ESCUDERO, I. Using latent semantic analysis to grade brief summaries: A study exploring texts at different academic levels. Lit Linguist Computing, 28(3), p. 388-403, 2013.

OMAR, A. Addressing Subjectivity and Replicability in Thematic Classification of Literary Texts: Using Cluster Analysis to Derive Taxonomies of Thematic Concepts in the Thomas Hardy’s Prose Fiction. Proceedings of the Chicago Colloquium on Digital Humanities and Computer Science, 1(2), p. 1-14, 2010.

OMAR, A. Addressing Subjectivity in Thematic Classification of Literary Texts: A Fresh Look at Thomas Hardy’s Prose Fiction. Berlin: Lambert, 2015.

OZGUR, Y. Empirical selection of nlp-driven document representations for text categorization. Syracuse University, 2006.

PREDELLI, S. Contexts: Meaning, Truth, and Use of Language. Oxford: Oxford University Press, 2005.

RAMSAY, S. In Praise of Pattern. TEXT Technology: the Journal of Computer Text Processing, 14(2), p. 177-190, 2005.

RAMSAY, S. Algorithmic Criticism. In: SIEMENS, R. G.; SCHREIBMAN, S. (eds.), A companion to digital literary studies (Vol. A companion to digital literary studies, pp. xx, 620 p.). Malden, MA: Blackwell Publishers, 2007.

RAMSAY, S.; STEGER, S. Distinguished Speakers: Keyword Extraction and Critical Analysis with Virginia Woolf’s The Waves. Digital Humanities, Sorbonne, Paris, 2006.

RIESEN, K.; BUNKE, H. Graph classification and clustering based on vector space embedding. Singapore; London: World Scientific, 2010.

ROBERTSON, S. Understanding inverse document frequency: on theoretical arguments for IDF. Journal of Documentation, 60(5), p. 503-520, 2004.

SARICKS, J. G. The Readers' Advisory Guide to Genre Fiction. Chicago; London: American Library Association, 2009.

SHALABY, W.; ZADROZNY, W. Semantic Representation Using Explicit Concept Space Models. Proceddings of the 31 AAAI Conference on Artificial Intelligence, 2017, p. 4983-4984.

SPÄRCK JONES, K. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, p. 11-21, 1972.

SPEARE, M. E. The Political Novel: Its Development in England and in America. New York: Oxford University Press, 1924.

SPEER, R.; HAVASI, C. ConceptNet 5: A Large Semantic Network for Relational Knowledge. In GUREVYCH, I.; KIM, J. (eds.). The People's Web Meets NLP- Collaboratively Constructed Language Resources: Springer, 2013, p. 161-176.

STEINER, P. Russian Formalism. In: SELDEN, R. (ed.), The Cambridge History of Literary Criticism. Cambridge: Cambridge University Press, 8 ed., 1995, p. 11-29.

TOBIN, Y. The Prague School and its Legacy n Linguistics, Literature, Semiotics, Folklore, and the Arts. Amsterdam; Philadelphia: J. Benjamins, 1988.

UTAS, B. Genres in Persian Literature 900-1900. In: LINDBERG-WADA, G.; PETTERSSON A.; PETERSSON, M.; HELGESSON, S. (eds.), Literary history: towards a global perspective. Berlin; New York: W. de Gruyter, Vol. 2, p. 199-242, 2006.

WALKOWITZ, R. Immigrant Fictions: Contemporary Literature in an Age of Globalization. Madison: Wisconsin University of Wisconsin Press, 2010.

WEI, T.; LUC, Y.; CHANGB, H. A semantic approach for text clustering using WordNet and lexical chains. Expert Systems with Applications, 42(4), p. 2264-2275, 2015.

WEIPING, W.; PENG, C.; BOWEN, L. A Self-Adaptive Explicit Semantic Analysis Method for Computing Semantic Relatedness Using Wikipedia. Proceedings of the 2008 International Seminar on Future Information Technology and Management Engineering, 2008.

WILDER, L. A. Rhetorical Strategies and Genre Conventions in Literary Studies: Teaching and Writing in the Disciplines. Southern Illinois University Press, 2012.

WOLTERS, M.; KIRSTEN, M. Exploring the Use of Linguistic Features in Domain and Genre Classification. Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics, Bergen, Norway, 1999.

XIAO, Z.; MCENERY, A. Two Approaches to Genre Analysis: Three Genres in Modern American English. Journal of English Linguistics, 33(1), p. 62-82, 2005. doi:10.1177/0075424204273957.




DOI: http://dx.doi.org/10.17851/1983-3652.13.2.%25p

Apontamentos

  • Não há apontamentos.




Texto Livre: Linguagem e Tecnologia
ISSN 1983-3652 (eletrônica)

Faculdade de Letras da Universidade Federal de Minas Gerais

Belo Horizonte - Minas Gerais (Brasil)

Licença Creative Commons

Esta obra está licenciada com uma Licença Creative Commons Atribuição 4.0 Internacional.
SCImago Journal & Country Rank