Background

F1000Research

2046-1402

F1000 Research Limited

London, UK

10.12688/f1000research.177481.1

Data Note

Articles

CYCLIST+IMU: A synchronized visual–inertial dataset for cyclist orientation and perception in urban environments

[version 1; peer review: 2 approved with reservations]

Gómez-Meneses

Luis

Conceptualization Formal Analysis Methodology Project Administration Software Validation Visualization Writing – Original Draft Preparation https://orcid.org/0000-0002-0667-7472 a 1 Arias-Correa

Mauricio

Investigation Resources Writing – Original Draft Preparation 2 Herrera-Ramírez

Jorge

Supervision Validation Writing – Review & Editing 3 Ballesteros

John R.

Data Curation Investigation Validation 4 1Faculty of Engineering, Instituto Tecnologico Metropolitano, Medellín, Antioquia, Colombia 2Design Engineering Research Group (GRID), Universidad EAFIT, Medellín, Antioquia, Colombia 3Faculty of Exact and Applied Sciences, Instituto Tecnologico Metropolitano, Medellín, Antioquia, Colombia 4Department of Computer Science and Decision Sciences, Universidad Nacional de Colombia Sede Medellin, Medellín, Antioquia, Colombia

a luisgomez251811@correo.itm.edu.co

No competing interests were disclosed.

15 4 2026

2026

527

13 3 2026

2026

This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Background

Cyclists are among the most vulnerable road users in urban traffic environments. For autonomous vehicles to interact safely and effectively with cyclists, perception systems must go beyond detection and segmentation to include an explicit understanding of cyclist orientation. However, most existing cyclist datasets lack synchronized inertial metadata describing body orientation, limiting their use in multimodal and orientation-aware perception studies.

Methods

This Data Note presents a multimodal visual–inertial dataset acquired using the CYCLIST+IMU framework, which synchronizes monocular RGB images captured from a vehicle-mounted camera with inertial measurements recorded by bicycle-mounted and vehicle-mounted inertial measurement units. Data were collected during multiple real-world urban acquisition sessions, resulting in 3,606 RGB images, each temporally aligned with inertial measurements, including cyclist orientation angles (yaw and roll). From these acquisitions, cyclist-centered image crops were generated and manually annotated, resulting in polygon-based semantic segmentation labels, region-of-interest detection files, and relative depth maps estimated from the RGB images. To improve angular coverage, a targeted data augmentation strategy based on horizontal image flipping was applied to underrepresented orientation ranges, resulting in the generation of 718 additional samples. The final dataset comprises 4,324 synchronized multimodal samples organized in a hierarchical directory structure that preserves one-to-one correspondence across all data modalities.

Conclusions

The CYCLIST+IMU dataset provides synchronized RGB image crops, inertial orientation metadata, semantic segmentation annotations, relative depth maps, and detection files for 4,324 cyclist instances captured under real urban traffic conditions. By explicitly integrating visual and inertial data with precise temporal alignment and detailed documentation, this dataset enables reproducible research on cyclist orientation estimation, semantic segmentation, and multimodal sensor fusion for intelligent transportation systems.

Cyclist dataset Visual–inertial dataset Cyclist orientation Yaw and roll angles Vulnerable road users Semantic segmentation Depth maps Urban traffic scene.

The author(s) declared that no grants were involved in supporting this work.

1. Introduction

Time has proven that conventional cars are remarkably inefficient. They don’t just consume energy but also erode human health and productivity. The paradox of private ownership exacerbates this, as vehicles spend most of their existence idle, wasting precious space and materials. Driven by these shortcomings, a global interest in autonomous vehicles (AVs) has surged since the 1980s, mobilizing universities and industry leaders to rethink the very nature of mobility ( Badue et al., 2019; Thrun, 2010; Narula & Tyagi, 2023).

The integration of AVs -whether for personal or public use- into heterogeneous traffic settings demands an interactive capability that transcends rudimentary obstacle recognition. These systems must interface harmoniously and safely with manual drivers, cyclists, and pedestrians. Within these shared environments, human participants navigate through an intricate web of implicit cues, such as nuanced adjustments in approach speed, alongside explicit signals like eye contact or hand gestures. These interactions establish a mutual consensus that enables the fluid synchronization of future maneuvers among road participants. However, contemporary AV architectures tend to prioritize a strict, rationalist framework of collision avoidance over social negotiation. Consequently, these vehicles frequently manifest non-human patterns, including abrupt halts, hesitant movements, or excessive delays at junctions. Such behaviors disrupt the temporal rhythm of traffic and can, paradoxically, undermine overall systemic safety ( Brown & Laurier, 2017; Brown, Broth & Vinkhuyzen, 2023).

The operational scope of an AV necessitates the accurate identification of both road signage and traffic participants, specifically those lacking a protective mechanical framework—such as pedestrians and cyclists. Categorized as ‘Vulnerable Road Users’ (VRUs), these individuals are exposed to a disproportionate risk of sustaining severe injuries or fatalities in the event of traffic accidents ( Flohr, 2018; Mannion, 2019).

Road traffic injuries have persisted as the twelfth leading cause of death across all age groups globally. Within this context, VRUs account for more than half of the 1.19 million annual fatalities reported by the World Health Organization ( WHO, 2023). As seen in Figure 1, cyclists account for 5% of these global deaths, a percentage that has increased by nearly 20% over the last decade. This vulnerability is further intensified when cyclists must navigate mixed traffic environments, where safety is predicated on the mutual understanding of motion; a form of social coordination that contemporary autonomous systems still struggle to replicate ( Ghoul & Sayed, 2025; Lu et al., 2025).

Figure 1. Global percentage distribution of country-reported deaths by road users.

Source: ( World Health Organization, 2023).

Cyclist detection persists as a challenge for AV perception systems, primarily due to the inherent visual complexity associated with non-rigid articulations, highly variable aspect ratios, and a diverse range of spatial orientations. Beyond the technical impediments posed by occlusions and cluttered urban environments, contemporary research underscores that mere classification is insufficient. Systems must achieve a sophisticated understanding of behavioral intent through advanced frameworks ( Corral-Soto et al., 2025b). Consequently, the estimation of orientation angles has transitioned from a secondary metric to a critical precursor for the ‘reflexive adjustment’ required for an AV to safely navigate and avoid collisions with cyclists ( Brown et al., 2023).

Regardless of the previously addressed technical and social imperatives, a systematic examination of cyclist datasets published between 2023 and 2025 reveals a significant deficit in metadata fidelity. Contemporary repositories fail to provide three-dimensional orientation parameters (Roll, Pitch, Yaw) integrated with the cyclist’s posture within the image frame, as illustrated in Table 1. This lack of data limits the ability of AVs to interpret the cyclist’s body language and, consequently, delays the achievement of what Brown et al. (2023) term the ‘sociality of traffic’ for AVs.

Table 1. Comparative analysis of cyclist datasets (2023–2025).

Source: Authors.

Work (APA citation)	Dataset characteristics	Does it include orientation angles (Roll, Pitch, Yaw) and acquisition method
Chiang, C. Y., et al. (2024). AllTheDocks road safety dataset: A cyclist’s perspective and experience.	AllTheDocks: Collected in London through citizen science. Includes video (61.68 km), accelerometer, GPS, and gyroscope data.	No (for cyclists in the image). The dataset includes gyroscope data (GyroX, GyroY, GyroZ), but these correspond exclusively to the ego-cyclist carrying the camera. Method: Telemetry extracted from helmet-mounted GoPro cameras.
Yan, Z., Li, J., Hang, P., & Sun, J. (2025). OnSiteVRU: A high-resolution trajectory dataset for high-density vulnerable road users.	OnSiteVRU: High-resolution trajectory data (0.04 s) collected in China. Covers intersections, road segments, and urban villages with 17,429 VRU trajectories.	No (partial). Only includes the heading angle (direction of motion relative to the X-axis). Roll and pitch for the cyclist posture in the image are not provided. Method: Extraction using elevated vision cameras (YOLOv7/DeepSORT) and onboard vehicle sensors (LiDAR/IMU).
Goren, D., & Caesar, H. (2025). BikeScenes: Online LiDAR semantic segmentation for bicycles.	BikeScenes-lidarseg: LiDAR semantic segmentation dataset captured from a bicycle perspective. Contains 3,021 scans annotated into 29 semantic classes.	No. Although the SenseBike platform includes an IMU for ego-motion compensation, cyclist metadata corresponds only to semantic segmentation labels, not orientation angles. Method: Offline LiDAR point cloud registration with GLIM and manual scan-level annotation.
Li, M., et al. (2025). A benchmark for cycling close pass detection from video streams.	Cyc-CP: Benchmark combining Victorian On-road Cycling (VOC) data and CARLA synthetic data. Focuses on close pass overtaking events.	No (partial). Predicts allocentric orientation angle (θ) of the overtaking vehicle. Roll, pitch, and yaw for cyclist posture are not defined. Method: Monocular 3D detection using FCOS3D on single-view video.
Desai, N. P., Etemad, A., & Greenspan, M. (2025). CycleCrash: A dataset of bicycle collision videos for collision prediction and analysis.	CycleCrash: 3,000 dashcam videos with 436,347 frames depicting cyclist collisions and near-miss events.	No. Direction annotations are limited to five discrete classes (forward, backward, left, right, stationary). Method: Web video curation and manual annotation based on traffic rules.
Corral-Soto, E. R., et al. (2025a). 3DArticCyclists: Generating synthetic articulated 8D pose-controllable cyclist data for computer vision applications.	3DArticBikes/3DArticCyclists: Hybrid synthetic–real dataset addressing cyclist data scarcity for autonomous driving. Includes 11,086 cyclist–bicycle configurations.	Yes. Provides full 3D orientation parameters (θx, θy, θz corresponding to roll, pitch, and yaw). Method: Synthetic generation using Blender and 3D Gaussian Splatting, with pose refinement via inverse kinematics based on real video data processed with CLIFF.

To address the identified lack of data, this paper presents a multimodal cyclist dataset that synchronizes real-world RGB imagery, depth maps, and semantic segmentation with precise, frame-by-frame inertial telemetry. Unlike contemporary synthetic frameworks—such as 3DArticCyclists ( Corral-Soto et al., 2025a)—this dataset provides empirical ground truth for three-axis orientation (Roll, Pitch, Yaw) and triaxial acceleration (Ax, Ay, Az). By integrating these dynamic parameters, the proposed dataset enables the training of autonomous navigation models that move beyond rudimentary collision avoidance, facilitating the complex social coordination required to achieve the ‘sociality of traffic’ ( Brown et al., 2023).

2. Materials and methods 2.1 Data acquisition system

Cyclist images and synchronized visual–inertial data were captured using the CYCLOPS system (cyclists’ orientation data acquisition system using RGB camera and inertial measurement units). This original development consists of a node located on a vehicle and another on a bicycle. The vehicle node includes a monocular RGB camera, an inertial measurement unit (IMU), an RF transceiver, and a microcontroller. The bicycle node includes an IMU, an RF transceiver, and a microcontroller. The system facilitates the acquisition of images of a moving cyclist and associates each image with both acceleration and orientation angles (Ax, Ay, Az, Roll, Pitch, Yaw), as illustrated in Figure 2. Similarly, camera acceleration and orientation angles are acquired at the vehicle to subsequently obtain relative values (cyclist relative to camera) and establish the cyclist’s real orientation in each image acquired from the vehicle while both are moving in an urban environment.

Figure 2. Frame assignment for both, the camera attached to a car’s windshield (over the vehicle) and the bicycle’s top tube (cyclist).

For the vehicle, the axes have been named Xv, Yv, and Zv, and rotations around the axis are ROLLv, PITCHv, and YAWv (respectively). Similarly, the frame for the cyclist has axes Xc, Yc, and Zc, and rotations around the axis are ROLLc, PITCHc, and YAWc (respectively). Source: Adapted from the original in Arias-Correa et al. (2024).

A significant achievement of the CYCLOPS acquisition system lies in the synchronization between images and 6-axis motion data for both actors. Full details regarding the open-source hardware architecture, the IMU-based sensor fusion, and the software suite (VideoCapture) are documented in Arias-Correa et al. (2024), ensuring complete experimental reproducibility. A general diagram of the hardware and software for each node (cyclist node and camera node) of the CYCLOPS system is presented in Figure 3.

Figure 3. Diagram of the hardware and software for each node of the CYCLOPS system.

The block Cyclist includes a printed circuit board (PCB), which comprises an IMU, an Arduino Nano board (Microcontroller-C), and a transceiver HC12 in transmission mode (Transceiver-C) with an antenna. Similarly, the block camera includes the RGB camera mounted on the vehicle’s windshield, an IMU (IMU-V), an Arduino Nano board (Microcontroller-V), and a transceiver HC12 (Transceiver-V) with an antenna. Both the camera and the PCB send data to the acquisition software running on a computer. Source: Adapted from the original in Arias-Correa et al. (2024).

2.2 Visual–inertial data synchronization

The CYCLOPS system implements a hardware-level synchronization protocol to ensure precise temporal registration between the visual–inertial data from the RGB camera and the high-frequency inertial data from the IMU sensors. This synchronization is executed at the point of acquisition via a deterministic software architecture that manages concurrent sensor triggering and logging across both bicycle-mounted and vehicle-mounted nodes. Specifically, each RGB frame is hardware-level synchronized to a discrete set of inertial measurements captured at the exact timestamp t _k of the camera’s shutter release. This one-to-one temporal mapping constitutes the fundamental visual–inertial sample of the dataset. For every image captured by the vehicle node, the system captures the cyclist’s instantaneous kinematic state, including orientation variables such as yawb and rollb referenced to the coordinate frames established during system calibration. Crucially, this process is managed in real-time using a unified system clock, which precludes the need for subsequent corrections such as temporal interpolation, resampling, or offline alignment. By avoiding these post-processing steps, the CYCLOPS dataset maintains the integrity of the raw sensor data, providing a high-fidelity, temporally consistent snapshot of the cyclist’s pose and motion at the precise moment of visual capture as seen in Figure 4.

Figure 4. Result of the CYCLOPS acquisition process.

The figure shows the output generated by the CYCLOPS system for several consecutive RGB image frames acquired by the vehicle-mounted camera, together with their corresponding inertial measurements stored in format. Each image frame is temporally synchronized with a discrete set of inertial data captured at the exact acquisition instant, establishing a one-to-one correspondence between visual and inertial information. This synchronized visual–inertial data product represents the final output of the CYCLOPS framework.

All details regarding the CYCLOPS hardware components (including camera and IMU models), synchronization strategy, calibration procedures, and data verification experiments are fully described and validated in Arias-Correa et al. (2024) and are therefore not repeated here.

Each RGB image captured by the vehicle-mounted camera is associated with a corresponding set of inertial measurements describing the cyclist’s motion and orientation at the time of capture. Orientation variables, including yaw and roll, are derived from the inertial measurements and expressed in their respective reference frames as defined by the system configuration. Although the dataset also includes pitch measurements, this variable is not explicitly discussed here because the subsequent analysis and data augmentation procedures focus exclusively on yaw and roll.

2.3 Data Acquisition Protocol

Data were collected across multiple independent acquisition sessions conducted in urban environments. Each session is identified using the label Adq_x and corresponds to a continuous recording sequence acquired while the cyclist and the acquisition vehicle were in motion under real traffic conditions.

During each acquisition session, RGB images of the cyclist were recorded using the vehicle-mounted camera, while inertial measurements describing the cyclist’s motion and orientation were simultaneously captured by the bicycle-mounted and vehicle-mounted IMUs. A single sample is defined as an RGB image associated with its corresponding synchronized inertial measurements, including orientation angles such as yaw and roll, as described in Section 2.2.

The acquisition protocol was designed to capture natural cyclist behavior in unconstrained urban scenarios. Data were collected under daytime lighting conditions on public roads, without imposing specific trajectories or maneuvers on the cyclist beyond normal riding behavior.

Following data collection, a basic quality control process was applied to the acquired data. Samples exhibiting acquisition failures, severe occlusions of the cyclist, or loss of visual–inertial synchronization were excluded from the dataset. Only samples in which the cyclist was clearly visible, and the corresponding inertial data were successfully recorded and retained for further processing and inclusion in the dataset.

In this work, a total of 3,606 RGB images were acquired, each associated with its corresponding inertial measurements obtained from the IMU units of the CYCLOPS system. Images and inertial data are stored following a hierarchical file structure designed to preserve the temporal correspondence between each image and its associated orientation and acceleration records. After data acquisition, the RGB images were manually annotated using the DarkLabel tool ( https://github.com/darkpgmr/DarkLabel ) to identify and delineate the cyclist region of interest (ROI). The resulting annotations were exported in YOLO format (*.txt files), providing normalized bounding box coordinates for each cyclist and enabling their direct use in object detection and subsequent analysis pipelines.

2.4 Image processing

To construct a dataset suitable for computer vision tasks, additional processing was applied to the cyclist region-of-interest (ROI) images extracted from the original RGB images. This processing stage includes manual semantic segmentation of the cyclist and the generation of relative depth maps, as illustrated in Figure 5.

Figure 5. RGB image processing stages.

(a) Manual segmentation of the cyclist from RGB images using the LabelMe annotation tool, where the region corresponding to the cyclist is precisely delineated. (b) Depth map estimation from RGB images using an encoder–decoder model, generating a complementary geometric representation of the cyclist and the surrounding environment.

Polygon-based semantic segmentation of the cyclist was performed using the LabelMe annotation tool ( Russell et al., 2008), which supports polygon-based annotation and is widely used in computer vision applications. As illustrated in Figure 5a, for each RGB image, the region corresponding to the cyclist was manually delineated using polygonal annotations. This procedure was applied to the entire dataset to ensure a consistent and precise definition of the region of interest across all samples. The resulting annotations were stored in JSON format, preserving the geometric information required for subsequent processing and reuse.

In parallel, the RGB images were processed to obtain depth maps using the Depth Anything model ( Yang et al., 2024), as shown in Figure 5b. Depth Anything follows a base-model paradigm, and is trained at a large scale using unlabeled data, enabling robust generalization across diverse visual scenes. Although alternative approaches such as MiDaS have demonstrated strong performance through supervised and weakly supervised training on curated datasets ( Ranftl et al., 2020; Birkl et al., 2023). The Depth Anything model was selected due to its ability to produce consistent relative depth estimates without relying on task-specific supervision. The depth maps included in the CYCLIST+IMU dataset represent relative depth and have been stored as 8-bit grayscale images, where depth values (non-metric) were normalized and linearly mapped to the range [0, 255] before saving in JPEG format. These depth maps are provided as complementary geometric context to the RGB images and semantic annotations.

2.5 Data augmentation

To enhance the coverage of cyclist orientations and increase the angular diversity of the dataset, a data augmentation strategy based on geometric transformations was applied. This strategy was designed to expand the range of represented orientations while preserving the physical coherence and visual consistency of the samples.

Data augmentation was performed using a horizontal flipping transformation applied to the RGB images. This operation generates additional samples by flipping the original images along the vertical axis, thereby increasing orientation diversity without introducing artificial visual artifacts or altering the geometric structure of the cyclist and the surrounding environment.

To maintain consistency between the visual content and the associated orientation labels, yaw and roll values were deterministically updated following the transformation. Under horizontal flipping, yaw angles were transformed according to: yaw′ = 360° − yaw. Meanwhile, roll values were inverted as: roll′ = − roll. This transformation was applied only to samples belonging to selected underrepresented yaw intervals, as identified during the dataset validation stage ( Section 3.1).

An illustrative example of the data augmentation process is shown in Figure 6, where the original image, the augmented image, and the corresponding updates to yaw and roll angles are presented.

Figure 6. Example of data augmentation using horizontal image flipping.

The original RGB image and the corresponding augmented image obtained by horizontal flipping are shown. Yaw and roll angles are deterministically updated to preserve geometric consistency, with ( yaw’ = 360°-yaw) and ( roll’ = −roll ).

2.6 Dataset structure and organization.

The dataset is organized using a hierarchical directory structure designed to preserve traceability between original acquisitions and all derived data products. At the top level, the dataset is divided into two main directories: Original_Image, containing the raw RGB images, and Image_Crops, which stores all processed cyclist-centered data.

Within Image_Crops, data are grouped into acquisition-level subdirectories labeled Adq_x, where x denotes an independent synchronized recording session, integrating RGB imagery and inertial measurement unit (IMU) data. Each Adq_x directory contains four modality-specific subfolders: RGB_IMAGE (cropped RGB images), Detection (region-of-interest detection files), Polygons (polygon-based semantic segmentation annotations in JSON format), and Depth_Map (relative depth maps).

Each acquisition session also includes a metadata file (Adq_x.xlsx) consolidating inertial information, including cyclist orientation parameters (yaw and roll), and unique identifiers linking all data modalities. File naming follows the convention Adq_x (n), ensuring a strict one-to-one correspondence among RGB images, detection files, segmentation annotations, depth maps, and inertial measurements.

An overview of the dataset structure is provided in Figure 7, and a summary of dataset contents and file formats is presented in Table 2.

Figure 7. Hierarchical organization of the CYCLIST+IMU dataset.

The dataset is structured to preserve traceability between the original multimodal acquisitions and all derived data products. Raw RGB images are stored in the Original_Image directory, while processed cyclist-centered data are organized under Image_Crops by acquisition session (Adq_x). Each Adq_x represents a synchronized acquisition unit integrating RGB imagery and inertial measurement unit (IMU) data, including cyclist orientation parameters such as yaw and roll observed from the acquisition vehicle. For each session, cropped RGB images, region-of-interest detection files, relative depth maps, and polygon-based semantic segmentation annotations are stored in modality-specific subdirectories using a consistent naming convention that ensures a strict one-to-one correspondence across data types.

Table 2. Summary of the generated dataset structure.

Source: Authors.

Data type	Format	Quantity
Cyclist RGB images	.JPG	4324
Segmentation polygons	.JSON	4324
Depth maps	.JPG	4324
Cyclist detection annotations	.TXT	4324
IMU data	.XLSX	4324

Table 2 presents the complete description of the dataset generated in this work.

3. Dataset validation

A dataset validation procedure was conducted to assess the internal consistency, angular coverage, and coherence of the visual–inertial orientation metadata. Validation focuses on descriptive and structural properties of the dataset rather than on model performance, in accordance with the scope of a Data Note.

3.1 Angular consistency and coverage.

The CYCLOPS orientation angles include yaw and roll, which require different statistical treatments due to their measurement domains. Yaw is a periodic variable defined over [0°, 360°), while roll is restricted to a narrow, non-periodic range around zero (approximately −20° to 20°).

Yaw distributions were analysed using circular statistics to avoid discontinuities at the 0°/360° boundary, computing circular mean, median, and deviation with the pycircstat Python library. Roll values were characterised using standard linear descriptive statistics (mean, median, and standard deviation).

Figure 8 shows the yaw and roll distributions prior to data augmentation, and Table 3 reports the corresponding descriptive statistics, confirming adequate angular coverage of cyclist orientations under real-world riding conditions.

Figure 8. Distribution of angular variables acquired by the CYCLOPS system.

(a) Circular distribution of cyclist orientation angles (yaw), represented using circular statistics to account for the periodic nature of the variable over the [0°, 360°) range. (b) Linear distribution of cyclist inclination angles (roll), analysed using linear statistics due to their bounded range around 0°.

Table 3. Descriptive statistics of the angular variables ( <italic toggle="yes">yaw</italic> and <italic toggle="yes">roll</italic> ) in the CYCLOPS dataset.

Source: Authors.

Yaw	Roll
Total samples: 3606	Total samples: 3606
Circular mean: 297.05°	Mean: 0.44°
Circular median: 283.81°	median: −0.81°
Circular deviation: 141.12°	Standard deviation: 9.68°

3.2 Validation of the data augmentation strategy.

The data augmentation procedure based on horizontal image flipping was validated to ensure geometric consistency between RGB images and orientation angles. After augmentation, yaw and roll values were deterministically updated following the transformation rules defined in the Methods section.

Consistency between augmented images, updated orientation angles, and the associated segmentation polygons, detection files, and relative depth maps was verified through visual inspection. Figure 6 illustrates the augmentation process, while Figure 9 shows the post-augmentation distributions of yaw and roll, highlighting the angular regions targeted to improve coverage.

Figure 9. Post-augmentation distribution of cyclist orientation and inclination variables.

(a) Circular distribution of orientation angles ( yaw) after data augmentation. Angular bins highlighted in red indicate the yaw intervals selected for horizontal flipping and sample augmentation. (b) Linear distribution of inclination angles ( roll ) after data augmentation. Red-highlighted bins correspond to the roll range associated with the augmented samples, while blue bins represent the remaining original and augmented data.

Table 4 summarizes the descriptive statistics after augmentation, confirming enhanced angular coverage without altering the overall structure of the original dataset.

Table 4. Descriptive statistics of the angular variables ( <italic toggle="yes">yaw</italic> and <italic toggle="yes">roll</italic> ) after data augmentation in the CYCLOPS dataset.

Source: Authors.

Yaw	Roll
Total samples: 4324	Total samples: 4324
Circular mean: 307.15°	Mean: −0.34°
Circular median: 295.56°	median: −0.56°
Circular deviation: 140.22°	Standard deviation: 8.94°

Ethical considerations

The data presented in this Data Note were collected in public urban environments under natural traffic conditions. The acquisition protocol consisted of recording cyclist behavior in real-world settings without clinical intervention, behavioral manipulation, or collection of personal identifiers.

All cyclists recorded in this dataset were adult volunteers known to the research team and were fully informed about the purpose of the data acquisition and the intended public release of the dataset. Informed consent for participation and publication of anonymized data was obtained verbally prior to data collection. Written consent was not deemed necessary because no personal identifiable information was collected, and all visual data were anonymized prior to public release.

No personal data such as names, identification numbers, contact information, or biometric identifiers were collected or stored during acquisition. Prior to publication, all captured visual images included in both the publicly released dataset and this manuscript were automatically processed using a YOLO-based face detection model, and any detected facial regions were anonymized through Gaussian blurring (21 × 21 kernel) to prevent individual identification. This anonymization procedure was systematically applied to all applicable captured images before public release.

Derived data products such as segmentation masks, depth maps, and inertial metadata do not contain identifiable facial information.

Because the released dataset does not contain personally identifiable information and consists of non-interventional observational recordings conducted with informed adult volunteers in public environments, this study qualifies as research without risk according to Colombian national regulations governing health research involving human participants (Resolution 8430 of 1993, Ministry of Health of Colombia). Under these regulations and applicable institutional guidelines, formal approval from an Institutional Review Board (IRB) or ethics committee was not required.

The individuals shown in Figures 4, 5, and 6, as well as all individuals appearing in the publicly released dataset, correspond to the same adult volunteers who provided informed consent for participation and publication of anonymized images. No third-party individuals were intentionally included in the dataset.

The study was conducted in accordance with the ethical principles outlined in the Declaration of Helsinki, insofar as applicable to non-interventional observational data collection.

Data availability

Open Science Framework (OSF). CYCLIST+IMU: A synchronized visual–inertial dataset for cyclist orientation and perception in urban environments. https://doi.org/10.17605/OSF.IO/HVPKZ ( Gómez-Meneses et al., 2026).

This project contains the following underlying data: •

CYCLIST_IMU_Dataset.zip (Complete dataset including RGB images, cyclist-centered image crops, inertial measurement files (.xlsx), semantic segmentation polygons (.json), region-of-interest detection annotations (.txt), and relative depth maps (.jpg), organized by acquisition session.)

Data is available under the terms of the Creative Commons Attribution 4.0 International copyright (CC BY 4.0) license.

Acknowledgements

The authors have no acknowledgements to declare.

References

Arias-Correa

Robledo

Londoño

: CYCLOPS: A cyclists’ orientation data acquisition system using RGB camera and inertial measurement units (IMU). HardwareX. 2024;18:e00534. 38690150

10.1016/j.ohx.2024.e00534

PMC11059332

Badue

Guidolini

Carneiro

: Self-driving cars: A survey. Expert Syst. Appl. 2019;165:113816. 10.1016/j.eswa.2020.113816

Birkl

Wofk

Müller

: MiDaS v3.1 – A model zoo for robust monocular relative depth estimation. arXiv preprint, arXiv:2307.14460. 2023. Reference Source

Brown

Laurier

: The trouble with autopilots: Assisted and autonomous driving on the social road. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM;2017; pp.416–429. 10.1145/3025453.3025462

Brown

Broth

Vinkhuyzen

: The halting problem: Video analysis of self-driving cars in traffic. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. ACM;2023; pp.1–14. 10.1145/3544548.3581045

Chiang

Zhong

Ding

: AllTheDocks road safety dataset: A cyclist's perspective and experience. 2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring). IEEE;2024, June; pp.1–5.

Corral-Soto

Liu

Ren

: 3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications. 2025 IEEE Intelligent Vehicles Symposium (IV). IEEE;2025a, June; pp.2114–2121.

Corral-Soto

Liu

Ren

: Monocular Visual 8D Pose Estimation for Articulated Bicycles and Cyclists. arXiv preprint arXiv:2510.20158. 2025b.

Desai

Etemad

Greenspan

: CycleCrash: A Dataset of Bicycle Collision Videos for Collision Prediction and Analysis. 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE;2025, February; pp.6688–6698.

Flohr

: Vulnerable road user detection and orientation estimation for context-aware automated driving. Universiteit van Amsterdam). UvA-DARE (Digital Academic Repository);2018. (Doctoral dissertation).

Goren

Caesar

: BikeScenes: Online LiDAR Semantic Segmentation for Bicycles. arXiv preprint arXiv:2510.25901. 2025.

Ghoul

Sayed

: Cyclist safety assessment using autonomous vehicles. Accid. Anal. Prev. 2025;212:107923. 39837243

10.1016/j.aap.2025.107923

Gómez-Meneses

Arias-Correa

Herrera-Ramírez

: CYCLIST+IMU: A synchronized visual–inertial dataset for cyclist orientation and perception in urban environments [Data set]. Open Science Framework. 2026. 10.17605/OSF.IO/HVPKZ

Beck

Rathnayake

: A benchmark for cycling close pass detection from video streams. Transportation Research Part C: Emerging Technologies. 2025;174:105112. 10.1016/j.trc.2025.105112

Zhu

: Empowering safer socially sensitive autonomous vehicles using human-plausible cognitive encoding. Proc. Natl. Acad. Sci. 2025;122(21):e2401626122. 40388625

10.1073/pnas.2401626122

PMC12130892

Mannion

: Vulnerable road user detection: State-of-the-art and open challenges. arXiv preprint arXiv:1902.03601. 2019. Reference Source

Narula

Tyagi

: Autonomous cars: A comprehensive survey. 2023 Seventh International Conference on Image Information Processing (ICIIP). IEEE;2023; pp.586–590. 10.1109/ICIIP61524.2023.10537704

Thrun

: Toward robotic cars. Commun. ACM. 2010;53(4):99–106. 10.1145/1721654.1721679

Ranftl

Lasinger

Hafner

: Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2022;44(3):1623–1637. 10.1109/TPAMI.2020.3019967

Russell

Torralba

Murphy

: LabelMe: A Database and Web-Based Tool for Image Annotation. Int. J. Comput. Vis. 2008;77(1–3):157–173. 10.1007/s11263-007-0090-8

World Health Organization: Global status report on road safety 2023. 2023. Reference Source

Yang

Kang

Huang

: Depth anything: Unleashing the power of large-scale unlabeled data. arXiv. 2024. Reference Source

Yan

Hang

: OnSiteVRU: A High-Resolution Trajectory Dataset for High-Density Vulnerable Road Users. arXiv preprint arXiv:2503.23365. 2025.

10.5256/f1000research.195711.r483535

Reviewer response for version 1

Al-Taie

Ammar

1 Referee https://orcid.org/0000-0002-5156-6245 1KAIST, Daejeon, South Korea

Competing interests: No competing interests were disclosed.

1 6 2026

2026

This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

recommendation

approve-with-reservations

Thank you to the authors for submitting this dataset of cyclists across different angles to support automated vehicle (AV) -cyclist interactions.

The researchers clearly show a strong need for this dataset, as AVs must safely detect and interact with cyclists on shared roads. However, I felt that the authors could make the introduction more accessible to readers as it is currently too technical. For example, rather than saying "Cyclist detection persists as a challenge for AV perception systems, primarily due to the inherent visual complexity associated with non-rigid articulations, highly variable aspect ratios, and a diverse range of spatial orientations" the authors can clearly explain that cyclists are likely to encounter AVs across a range of traffic scenarios, such as intersections, roundabouts or lane merging. This would ground the justification to real use cases and make the introduction more accessible.

The authors also only determine the orientation of the cyclists in the dataset; I would have liked to see a more elaborate labeling, e.g., specifying the traffic scenario, such as whether it is a roundabout. This could help AVs predict the likely cyclist orientation across different scenarios.

Moreover, the paper is lacking key citations from the field, including the work of Al-Taie, Von Sawitzky, and Matviienko, who have researched how AVs can communicate with cyclists.

Overall, I feel this is a well-justified dataset. However, it needs a discussion and motivation that would further ground the dataset into real-world scenarios, and citations of key papers in the field.

Are sufficient details of methods and materials provided to allow replication by others?

Partly

Is the rationale for creating the dataset(s) clearly described?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Partly

Reviewer Expertise:

I am a Human-Computer Interaction researcher investigating how Automated Vehicles can successfuly and safely communicate their intentions to surrounding road users.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

10.5256/f1000research.195711.r479065

Reviewer response for version 1

Pappalardo

Giuseppina

1 Referee 1University of Catania, Catania, Italy

Competing interests: No competing interests were disclosed.

13 5 2026

2026

recommendation

approve-with-reservations

The paper presents a valuable multimodal dataset (CYCLIST+IMU) that combines visual and inertial data to improve cyclist perception in urban environments. The dataset is well-structured, technically sound, and addresses a relevant gap in current research by including cyclist orientation information.

However, several key issues need to be addressed to ensure scientific robustness:

Participant description is insufficient: there is no clear information on the number, demographics, or variability of cyclists, limiting representativeness.

Limited scenario diversity: data are collected under constrained conditions (e.g., daytime only), reducing generalizability.

Validation is weak: it relies mainly on descriptive statistics, without assessing annotation quality, synchronization accuracy, or practical usability.

Figures are unclear: their size and design make it difficult to interpret results and understand improvements (e.g., after data augmentation).

Are sufficient details of methods and materials provided to allow replication by others?

Is the rationale for creating the dataset(s) clearly described?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Are the protocols appropriate and is the work technically sound?

Yes

Reviewer Expertise:

Cyclist safety