NS-CUK Open Datasets

This page serves as a repository for open datasets created by members and collaborators of the Network Science Lab. We aim to improve the availability of open datasets and the reproducibility of experiments, especially in interdisciplinary studies using artificial intelligence and data science perspectives.


Table of Contents

Digital Humanities and Social Sciences

The following datasets are part of the Open Datasets for Digital Humanities and Social Sciences project. This initiative seeks to enhance the availability of open datasets in the digital humanities and social sciences domains. The primary challenge in this field is the labour-intensive nature of data acquisition. Our project addresses this hurdle by providing meticulously curated, high-quality datasets and promoting their accessibility via open platforms. Spearheaded by the Network Science Lab at the Catholic University of Korea, this endeavour benefits from the collaborative efforts of scholars from diverse research institutions.


Digital Literature

A Social Network of Characters Appeared in the French Novel Series “La Comédie humaine” and An Interaction Network of Novels in the Series

  • Description: This dataset presents a character social network and a novel interaction network derived from Honoré de Balzac’s French novel series “La Comédie humaine”, written between 1829 and 1848. This extensive series, which comprises a multi-volume set of interconnected novels, is grouped into eight thematic categories: “Scenes of Private Life,” “Scenes of Provincial Life,” “Scenes of Parisian Life,” “Scenes of Political Life,” “Scenes of Military Life,” “Scenes of Country Life,” “Philosophical Studies,” and “Analytical Studies.” Out of the 137 combined completed and uncompleted works, we’ve extracted the core texts of 80 novels. These narratives are interwoven, with characters appearing in several novels. Given Balzac’s intention to portray 19th-century French society, particularly during the Restoration (1815-1830) and the July Monarchy (1830-1848), the characters’ overlapping journeys across novels offer insights into the social commentary he intended to convey. We first constructed a social network of the characters based on their co-occurrence rates in the novels to delineate their significance and connection to the themes of this series. We then constructed an interaction network of the novels, taking into account the number of shared characters between any two novels. This method allowed us to identify patterns linking Balzac’s use of characters with his methods of expressing the themes of his works.
  • Cite as: Eun-Soon You, Hyeon-Ju Jeon, O-Joun Lee: A Social Network of Characters Appeared in the French Novel Series “La Comédie humaine” and An Interaction Network of Novels in the Series. Figshare 08/2023. (Dataset)
    DOI

A Dynamic Social Network of Characters Appeared in the French Novel “Les Liaisons Dangereuses”

  • Description: Social networks for fictional characters offer valuable insights for quantitatively examining narrative elements. These networks highlight the importance, roles, and groupings of characters. This dataset showcases a social network of characters from the epistolary French novel, “Les Liaisons Dangereuses,” penned by Pierre Choderlos de Laclos in 1782. The relationships between characters are gauged by the frequency of letter exchanges, resulting in an asymmetric, directed graph representation. While static social networks provide significant insights into literature, monitoring the dynamic shifts in character relationships can elucidate plot developments, such as rising tensions or emerging conflicts. Consequently, we’ve mapped the dynamic social network of these characters across four intervals corresponding to the novel’s four chapters. Each interval captures interaction rates from the novel’s outset to the conclusion of that specific chapter. Comparing these four intervals reveals the evolving dynamics of character relationships throughout the narrative.
  • Cite as: Eun-Soon You, Hyeon-Ju Jeon, O-Joun Lee: A Dynamic Social Network of Characters Appeared in the French Novel “Les Liaisons Dangereuses”. Figshare 08/2023. (Dataset)
    DOI


Computational Social Sciences

Korean News Corpus for Changes in Child Abuse before and after COVID-19 Outbreak

  • Description: During the COVID-19 pandemic, the issue of child abuse surged to the forefront of societal concerns in Korea. The public became increasingly attentive to child welfare issues, and every major Korean newspaper covered instances of child abuse, delving into the details of each case and the backgrounds of both victims and perpetrators. To trace the origin of this heightened awareness, we analyzed Korean news articles from the period before and after the pandemic’s onset. We sourced articles from July 2017 to June 2022 (spanning two and a half years before and after the outbreak) using six search terms: “child abuse,” “child neglect,” “corporal punishment,” “emotional abuse,” “physical abuse,” and “sexual abuse” on Naver News, a leading Korean news portal. Our methodology involved Part-Of-Speech (POS) tagging and stop-word removal to assess keyword frequency. Our objective was to determine if the COVID-19 pandemic was a direct catalyst for the uptick in child abuse reports or if the issue had been previously underreported. Additionally, we built dynamic semantic networks from the articles to chart monthly shifts in the narratives around child abuse in Korea and the society’s evolving response to it. The gathered news data was archived in Pickle files, categorized by various dictionary keys such as publication date and time, publisher, title, URL, and article content.
  • Cite as: Jooho Lee, Yoewon Yoon, O-Joun Lee: Korean News Corpus for Changes in Child Abuse before and after COVID-19 Outbreak. Figshare 08/2023. (Dataset)
    DOI