An annotation of cuts, depicted locations, and temporal progression in the motion picture "Forrest Gump"

Here we present an annotation of locations and temporal progression depicted in the movie “Forrest Gump”, as an addition to a large public functional brain imaging dataset ( http://studyforrest.org). The annotation provides information about the exact timing of each of the 870 shots, and the depicted location after every cut with a high, medium, and low level of abstraction. Additionally, four classes are used to distinguish the differences of the depicted time between shots. Each shot is also annotated regarding the type of location (interior/exterior) and time of day. This annotation enables further studies of visual perception, memory of locations, and the perception of time under conditions of real-life complexity using the studyforrest dataset.

This article is included in the Real-life cognition channel.
This article is included in the Neuroinformatics channel. 1 1,2 1 2 Referee Status: Cognitive neuroimaging research is moving towards studying brain behavior under conditions of real-life-like complexity, and motion pictures are being utilized with increasing frequency as stimuli in "neurocinematics" studies 1 . What sets motion pictures apart from other dynamic naturalistic stimuli is that they are more likely to evoke time-locked response patterns in a larger portion of the brain while retaining synchrony across multiple individuals who are experiencing the same movie 2,3 . One likely reason for this is the structure of movies. They are typically not prolonged, contiguous captures of an environment from a first person perspective, but rather they are carefully assembled, using "cuts", from hundreds of short sequences shot from a variety of perspectives 4 . These cuts are sharp discontinuities in the sensory input that require all viewers to re-assess the depicted environment in order to perform a cognitive re-orientation in fictional space and time. This re-orientation can be complex and involve a large bandwidth of cognitive processes: interpretation of contextual cues for detection of familiar settings, retrieval of prior knowledge from memory, discovery of change in locales and depicted characters. Consequently movies, and their cuts in particular, offer an excellent instrument to study complex, concurrent, real-life cognition.
In this study, we focus on spatial and temporal viewer re-orientation, and, to this end, describe changes in depicted location and time for all cuts in the motion picture "Forrest Gump". This movie is the core stimulus of the studyforrest project (http:// studyforrest.org). Two fMRI datasets are publicly available: 1) participants listening to an audio-movie version 5 and 2) a subset of the original participants watching the audio-visual movie with simultaneous eye tracking 6 . Additional imaging data and movie annotations are available 7,8 , including an individual localization of the parahippocampal place area 9 that has been implicated in spatial perception and scene processing 10 .
This new annotation extends the available knowledge about the structure of this complex natural stimulus and enriches the overall studyforrest dataset. These data can be used to investigate the formation of a representation of viewer location and the perception of (speeded or negative) temporal progression in the movie stimulus.
For any study focusing on other aspects of real-life cognition, these new data can serve as additional confound measures describing key properties of major building blocks of this movie stimulus.

Stimulus
The annotated stimulus was a slightly shortened (≈2 h) version of the movie Forrest Gump (R. Zemeckis, Paramount Pictures, 1994) with dubbed German soundtrack that is identical to the audio-visual movie annotated in 8. Further details on this particular movie cut, and how to reproduce it from commercially available sources, are available in 6.
Annotation procedure First, the movie was explored by two people, one of whom has an academic background in documentary film making, in order to generate a consistent list of labels for depicted and recurring locations.
Subsequently, the actual annotation was performed by the first author using a multi-pass strategy. The movie was manually inspected frame-by-frame to determine the location of cuts (using the video editor Shotcut v16.02.01). For each new shot (sequence between two cuts), a number of properties (described below) were discerned and entered into a table. A total of four passes were performed by the same observer in order to validate the annotation.

Data legend
The annotation table contains one line per shot and seven columns: 1) a shot's start time, 2) a label for the shot's major location, 3) a label for the setting within the location, 4) a label for the locale within the setting, 5) a flag indicating an interior or exterior setting, 6) a label for the type of temporal progression with respect to the previous shot, and 7) a label for the time of day. Further details are provided in the following sections. The respective column header labels are given in parenthesis.

Shot start time (time)
A shot's start time is defined as the onset time of the first video frame of a shot after a cut. Time stamps a provided in seconds of movie onset.

Location
Location was coded with three labels, each describing the depicted scenery with an increasing level of detail.
Major location (major_location) provides a coarse identification at the level of a town, county, or region where the respective story is taking place. Examples are: "Greenbow" or "Vietnam".
Setting (setting) further details the location by distinguishing places at the same major location, but are not in direct sight of each other. For example, Forrest Gump's elementary school and the high school's football field are both in Greenbow, Alabama but are not part of the same setting. A switch from one setting to another is typically synonymous with a transition to a new scene in a cinematographic sense. If the camera switched settings within a scene, the annotation deviates from the screenplay to make explicit the switch to another setting.
Locale (locale) subdivides settings into distinguishable locales. Indoors, a locale is congruent with a particular room enclosed by walls. For example, Forrest Gump's bedroom, the corridor downstairs, and the corridor upstairs are three different rooms inside the Gumps' house (setting) on the Gumps' property (major location). Outdoors, locales were distinguished when they were separated by a logical boundary, substantial distance, or shared no discernible landmarks. For example, the glade at the river and the location of the wounded Bubba are two different locales in the embattled jungle (setting) in Vietnam (major location). A locale's label is identical to its setting label when only one locale is depicted for that setting.
Interior or exterior (intorext) This flag indicates whether a particular location is an open ("ext") or enclosed space ("int"), such as a building or a vehicle.

Temporal progression (flowoftime)
This label indicates the depicted progression of time between the previous and the current shot. Four categories were distinguished: "-" labels a flashback, or jump into the past, independent of the temporal distance; "0" indicates no noticeable break in the ongoing stream of time, for example a sole change of viewing perspective; "+" represents noticeable jumps in time, ranging from several seconds to about one or two hours; and lastly "++" marks major time jumps from several hours (e.g. night vs. day) to several years.

Time of day (timeofday)
This flag indicates whether a scene is at least partially illuminated by sunlight. Consequently, daytime and twilight (early sunrises and late sun settings) are labeled as "day". If sunlight is entirely missing, the time of day is coded as "night".

Dataset content
The released annotation is a single, text-based, comma-separatedvalue (CSV) formatted

Dataset validation
To check for human error in the cut time annotation, timings were compared to the results of an automatic detection algorithm and any deviation was manually verified.   Author contributions CH designed, performed, and validated the annotation, and wrote the manuscript. MH provided critical feedback on the procedure and wrote the manuscript.

Competing interests
No competing interests were disclosed.

Grant information
Michael Hanke was supported by funds from the German federal state of Saxony-Anhalt and the European Regional Development Fund (ERDF), Project: Center for Behavioral Brain Sciences.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
I have only minor comments: The timing seems to be a bit off relative to the previously released dataset of scenes in the studyforrest GitHub repository. Presumably the annotation for shots and scenes should line up at scene starts but there appears to be consistent offset of about 12ms. For example, the last scene ("School bus stop") starts at 6944.96 in the scenes.csv and 6944.84 in the attached dataset for shots in this paper. Moreover, the shots in this annotation don't quite line up with the shots.csv on the github repo. The ~12ms offset is too large to be a single frame. It appears the authors switched from Advene to Shotcut for movie segmentation and annotation, perhaps therein lies the source of the mismatch?
Could the authors expatiate on their method of identifying shots and cuts? If memory serves, in previous datasets they used an automated method to identify shots that was subsequently edited by hand. In this dataset, it appears all shots were identified by hand. Where all cuts identified? Or are there special cases were two cuts appearing in close succession were considered a part of one shot? For instance, in an action heavy scene you could presumably get an overabundance of cuts, but that level of granularity isn't really useful (nothing changes) and potentially these could be combined into a single shot. If every cut was indeed identified and annotated, then my sincere condolences to the coder! Although it is extremely generous of the authors to provide python code for generating descriptive data and associated figures, I've examined this code file and unfortunately this reviewer simply cannot support the premature use of Python 3… You can pry 2.7 from my cold dead hands. ;) Finally, I would like to again thank the authors for openly sharing this wealth of data with the community. These annotations and the associated imaging data represent a generous sharing of valuable resources, one that I have no doubt will be useful to many researchers interested in the neuroscience of naturalistic cognition.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
No competing interests were disclosed. Competing Interests: 14  This Data Note presents a very useful (and labor intensive!) complement to the studyforrest dataset, providing additional annotations that can be used for data analysis: the timings of all the cuts in the movie, together with the depicted location and temporal progression for each transition. There are of course many, many other features that can be labeled in this movie, yet this particular set of features is useful on its own and will add to the bank of features already available. I have no major changes to suggest.
Minor comments: would the authors consider publishing the code for the automated detection routine that they used