The Theme Ontology Project is an open access, community-based fiction studies undertaking to * define common literary themes (or "themes" for short), * classify defined themes into a hierarchically structured controlled-vocabulary, and * annotate works of fiction with the themes within a collaborative framework. The LTO refers to the hierarchically structured collection of literary themes (Sheridan, Onsjö, and Hastings 2019). The current developmental version of the LTO contains over 3,000 carefully defined themes. To date over 3,000 stories (e.g., films, novels, TV series episodes, video game plots, etc.) have been annotated with LTO themes. All data is hosted on the Theme Ontology GitHub repository theming. It can be explored on the Theme Ontology website. The stoRy
package is used to perform various statistical analyses on the data. These tells us interesting things about the kind of stories we humans invent.
The package is hosted on CRAN and can be installed by running the command
install.packages("stoRy")
The developmental version is hosted on GitHub and can be installed using the devtools package:
# install.packages("devtools")
# devtools::install_github("theme-ontology/stoRy")
Once installed, the package can be loaded by running the standard library command
library(stoRy)
Each function in the package is documented. The command
help(package = "stoRy")
gives a cursory overview of the package and a complete list of package functions.
Help with using functions is obtained in the usual R manner. For instance, the documentation for the get_similar_stories
function can be accessed with the command
?get_similar_stories
The command
citation("stoRy")
prints to console everything needed to cite the package in a publication.
This section is a good starting point for first time users. Included in the package is a toy dataset, extracted from the latest LTO version, comprising some 2,945 LTO themes and 335 thematically annotated The Twilight Zone American media franchise stories (Wikipedia 2021).
The themes are hierarchically arranged into three domains descended from an abstract root theme:
literary thematic entity
├── the human world
├── the natural world
└── alternate reality
Contained in the demo data are the following Twilight Zone thematically annotated stories:
156 The Twilight Zone (1959) television series episodes
3 Twilight Zone: The Movie (1983) film sub-stories
110 The Twilight Zone (1985) television series episodes
3 Twilight Zone: Rod Serling's Lost Classics (1994) film sub-stories
43 The Twilight Zone (2002) television series episodes
20 The Twilight Zone (2019) television series episodes
Users may avail themselves of the demo data to experiment with package functions without having to download any official LTO version data to their local machine.
To begin check that the demo
LTO version is active
which_lto()
Should it be that another LTO version is actively loaded, switch to the demo
version
set_lto(version = "demo")
Print LTO demo
version summary information to console
print_lto()
A detailed description of the demo data included in the package can be viewed by running the command
?`lto-demo`
Pro Tip: The demo data, and LTO version data more generally, is internally stored in such a way that prohibits users from modifying it. The following sequence of commands, however, clones the data:
demo_metadata_tbl <- clone_active_metadata_tbl()
demo_themes_tbl <- clone_active_themes_tbl()
demo_stories_tbl <- clone_active_stories_tbl()
demo_collections_tbl <- clone_active_collections_tbl()
See ?lto-demo
for more details on exploring the output tibble contents.
The social phenomenon of dangers, be they real or imagined, spreading through a community as a result of rumors and fear is explored in numerous works of fiction, including several Twilight Zone episodes.
The LTO captures hysteria of this kind with the theme mass hysteria. To explore "mass hysteria" theme ("demo" version) initialize the Theme
class object
theme <- Theme$new(theme_name = "mass hysteria")
The theme entry can be printed to console in two ways
# Print stylized text:
theme
# Print in plain text .th.txt file format:
theme$print(canonical = TRUE)
Story thematic annotations are stored as a tibble
theme$annotations()
See ?Theme
for more on Theme
class objects.
A Note on Finding Themes: LTO developmental themes are easily explored using the theme search box on the project website. Chances are that any theme found in the developmental version will also exist in the demo version. So searching for themes on the website offers a practical approach to finding interesting themes to initialize in an R session.
Pro Tip: Demo version themes are explorable in tibble format. For example, here is one way to search for "mass hysteria" directly in the demo themes:
# install.packages("dplyr")
suppressMessages(library(dplyr))
# install.packages("stringr")
library(stringr)
demo_themes_tbl <- clone_active_themes_tbl()
demo_themes_tbl %>% filter(str_detect(theme_name, "mass"))
Notice that all themes containing the substring "mass"
are returned. The dplyr
package is required to run the %>%
mediated pipeline.
Thematically annotated stories are initialized by story ID. For example, run
story <- Story$new(story_id = "tz1959e1x22")
to initialize a Story
class object representing the "mass hysteria" featuring classic Twilight Zone (1959) television series episode The Monsters Are Due on Maple Street.
Story thematic annotations along with episode identifying metadata can be printed to console
# In stylized text format:
story
# In plain text .st.txt file format:
story$print(canonical = TRUE)
A tibble of thematic annotations is obtained by running
themes <- story$themes()
themes
See ?Theme
for more on Theme
class objects.
A Note on Finding Story IDs: The project website story search box offers a quick-and-dirty way of locating LTO developmental version story IDs of interest. Since story IDs are stable, developmental version The Twilight Zone story IDs can be expected to agree with their demo data counterparts.
Pro Tip: A demo data story ID is directly obtained from an episode title as follows:
title <- "The Monsters Are Due on Maple Street"
demo_stories_tbl <- clone_active_stories_tbl()
story_id <- demo_stories_tbl %>% filter(title == !!title) %>% pull(story_id)
story_id
The dplyr
package is again required to run the %>%
mediated pipeline.
Each story belongs to at least one collection (i.e. a set of related stories). The Monsters Are Due on Maple Street, for instance, belongs to the two collections
story$collections()
To initialize a Collection
class object for The Twilight Zone (1959) television series, of which The Monsters Are Due on Maple Street is an episode, run:
collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)")
Collection info is printed to console in the same way as with themes and stories
# Print stylized text:
collection
# Print in plain text .st.txt file format:
collection$print(canonical = TRUE)
A Note on Finding Collection IDs: As with stories, LTO developmental version collections can be explored from the project website story search box. Developmental and demo version collection IDs should generally match up. This is in particular the case with Twilight Zone collection IDs.
Pro Tip: Demo version collections can be directly explored in the usual way
demo_collections_tbl <- clone_active_collections_tbl()
demo_collections_tbl
To view the top 10 most featured themes in the The Twilight Zone (1959) series run:
collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)")
result_tbl <- get_featured_themes(collection)
result_tbl
To view the top 10 most featured themes in the demo data as a whole run
result_tbl <- get_featured_themes()
result_tbl
To view the top 10 most enriched, or over-represented themes in The Twilight Zone (1959) series with all The Twilight Zone stories as background run
test_collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)")
result_tbl <- get_enriched_themes(test_collection)
result_tbl
To run the same analysis not counting minor level themes run
result_tbl <- get_enriched_themes(test_collection, weights = list(choice = 1, major = 1, minor = 0))
result_tbl
The theory and methods implemented in the get_enriched_themes
function are described in (Onsjö and Sheridan 2020).
To view the top 10 most thematically similar Twilight Zone franchise stories to The Monsters Are Due on Maple Street run
query_story <- Story$new(story_id = "tz1959e1x22")
result_tbl <- get_similar_stories(query_story)
result_tbl
The theory and methods implemented in the get_similar_stories
function are described in (Sheridan et al. 2019).
Cluster The Twilight Zone franchise stories according to thematic similarity by running
set.seed(123)
result_tbl <- get_story_clusters()
result_tbl
The command set.seed(123)
is run here for the purpose of reproducibility.
Explore a cluster of stories related to traveling back in time
cluster_id <- 3
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
Explore a cluster of stories related to mass panics
cluster_id <- 5
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
Explore a cluster of stories related to executions
cluster_id <- 7
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
Explore a cluster of stories related to space aliens
cluster_id <- 10
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
Explore a cluster of stories related to old people wanting to be young
cluster_id <- 11
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
Explore a cluster of stories related to wish making
cluster_id <- 13
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
The package works with data from these LTO versions
lto_version_statuses()
To download and cache the latest versioned LTO release run
configure_lto(version = "latest")
This can take awhile.
Load the newly configured LTO version as the active version in the R session:
set_lto(version = "latest")
To double check that it has been loaded successfully run
which_lto()
Now that the latest LTO version is loaded into the R session, its stories and themes can be analyzed in the same way as with the "demo" LTO version data as shown above.
Onsjö, Mikael, and Paul Sheridan. 2020. “Theme Enrichment Analysis: A Statistical Test for Identifying Significantly Enriched Themes in a List of Stories with an Application to the Star Trek Television Franchise.” Digital Studies/Le Champ Numérique 10 (1): 1.
Sheridan, Paul, Mikael Onsjö, and Janna Hastings. 2019. “The Literary Theme Ontology for Media Annotation and Information Retrieval.” In Proceedings of the Joint Ontology Workshops 2019.
Sheridan, Paul, Mikael Onsjö, Claudia Becerra, Sergio Jimenez, and George Dueñas. 2019. “An Ontology-Based Recommender System with an Application to the Star Trek Television Franchise.” Future Internet 11 (9).
Wikipedia. 2021. “The Twilight Zone — Wikipedia, the Free Encyclopedia.” https://en.wikipedia.org/w/index.php?title=The%20Twilight%20Zone&oldid=1042023157.