PEOD Dataset - Pixel-aligned Event-RGB Object Detection

Dataset Overview

PEOD addresses critical limitations in existing Event-RGB datasets: sparse coverage of extreme conditions and low spatial resolution (≤640×480)

57%

of data captured under challenging conditions (low-light, overexposure, high-speed motion)

120+ Synchronized Sequences

340K Verified Bounding Boxes

72K High-Quality Data Pairs

1280×720 High Resolution

>87dB High Dynamic Range

30Hz Annotation Frequency

Coaxial Dual-Camera System

True pixel-level spatial alignment and microsecond-level temporal synchronization using beam-splitter optical system

Challenging Environments

Urban, suburban, and tunnel environments with 60% data under extreme conditions (low-light, overexposed, high-velocity)

Six Object Classes

Meticulously verified annotations for car, bus, truck, two-wheeler, three-wheeler, and person

Microsecond Precision

Hardware signal generator ensures microsecond-level synchronization between RGB and event cameras

Split	Sequences	Frames	Bounding Boxes	Conditions
Training	~85	~57,000	270k	Diverse illumination & motion conditions
Test	~35	~15,000	70k	Held-out sequences for benchmarking
Total	120+	72k	340k	Complete dataset coverage

Challenging Scenarios Showcase

Representative examples demonstrating the limitations of conventional cameras and the advantages of event-based sensing

Challenging Scenarios from PEOD Dataset These examples demonstrate sensor failure scenarios where conventional frame-based cameras struggle: (Top and third rows) Overexposure in high dynamic range scenes leading to complete information loss; (Second row) Motion blur from high-speed movement; (Bottom row) Extreme low-light conditions with severe underexposure. In contrast to the degraded RGB images, the event-based streams (left side of each pair) consistently preserve crucial structural details and object boundaries, enabling robust detection even under these extreme conditions. This showcases the complementary nature of the two modalities and the critical importance of multimodal fusion for all-weather perception systems.

Overexposure Resilience

Event cameras maintain object visibility when RGB sensors saturate under bright lighting conditions

Motion Blur Immunity

High temporal resolution eliminates motion blur artifacts that degrade conventional camera performance

Low-Light Excellence

Superior performance in extreme low-light where RGB cameras fail to capture meaningful information

Dual-Camera Acquisition System

Our custom coaxial optical system ensures pixel-perfect alignment and precise synchronization

Coaxial Imaging System Our acquisition system utilizes a JCOPTIX OSB25R55-T5 non-polarizing plate beam splitter (50:50 split ratio) and MCC1-1S 10mm coaxial cube. The system comprises a Prophesee EVK4 HD event camera (1280×720) and Hikvision MV-CS050-10UC industrial RGB camera (2448×2048, 60fps), both equipped with identical Hikvision 25mm C-mount fixed-focal-length lenses for consistent imaging characteristics.

Hardware Synchronization

Single square-wave signal generator provides hardware trigger pulses to both cameras for microsecond-level accuracy

Pixel-Level Alignment

Shared optical path enables true pixel-level spatial alignment through standard stereo rectification

Identical Optics

Both cameras use identical lenses and fixed aperture settings to eliminate focal length and distortion discrepancies

Dataset Statistics & Visualizations

Comprehensive analysis of data distribution and sample annotations from challenging scenarios

Dataset statistics and sample visualizations

Dataset Statistics and Challenging Scenarios The figure shows the temporal distribution of our dataset with 57.1% captured under challenging illumination conditions, alongside sample aligned Event-RGB pairs from diverse driving scenarios. Our data includes urban roads, suburban areas, complex intersections, underpasses, tunnels, and highways captured continuously from 04:00 to 24:00, covering the full spectrum of lighting conditions from dawn to nighttime.

Illumination Analysis

Quantitative classification using under-exposure (S_LL) and over-exposure (S_OE) scores based on pixel saturation

Low-Light Conditions

Extensive coverage of challenging scenarios where conventional cameras fail but event cameras excel

Overexposure Scenarios

High dynamic range scenes demonstrating the >87dB HDR advantage of event cameras over RGB sensors

Benchmark Results

Comprehensive evaluation of RGB-only, Event-only, and Event+RGB fusion detectors

Modality	Method	mAP_50:95	mAP₅₀	mAP₇₅	Params(M)	Notes
Event-only	SMamba	22.9%	43.8%	19.9%	23.7	State-space model; highest event accuracy
Event-only	RVT	17.6%	31.9%	17.0%	4.4	Transformer-based approach
RGB-only	YOLOv8	19.3%	31.8%	18.8%	11.1	Best performing frame-based baseline
RGB-only	YOLOX	17.4%	37.0%	14.2%	8.9	Strong single-stage detector
Fusion	EOLO	24.2%	40.3%	26.1%	46.2	SNN + CSPDN fusion; top performer
Fusion	RENet	22.5%	39.6%	21.3%	37.7	ResNet-101 based fusion

Key Finding: Fusion models consistently outperform single-modality baselines, with EOLO achieving 24.2% mAP_50:95, surpassing the best Event-only model by +1.3pp and the best RGB baseline by +4.9pp.

Condition-Specific Performance

Extreme Illumination

Event-only methods excel with SMamba achieving 19.8% mAP, significantly outperforming RGB-only (13.0%) and fusion methods

Normal Conditions

Fusion methods dominate with EOLO reaching 43.2% mAP, exceeding RGB-only by +8.5pp and Event-only by +22.5pp

Usage Example

Simple Python code to get started with the PEOD dataset

import numpy as np
from pathlib import Path
import cv2

# Paths to frames, events and annotation file
frame_dir = Path('PEOD/train/sequence_001/rgb')
event_file = Path('PEOD/train/sequence_001/events.dat')
anno_file = Path('PEOD/train/sequence_001/boxes.npy')

# Load one RGB frame
img = cv2.imread(str(frame_dir / '000000.png'))

# Load bounding boxes (N × 5: frame_idx, class_id, x, y, w, h)
boxes = np.load(anno_file, allow_pickle=True)

# Draw first box on the first frame
first_box = boxes[0]
frame_idx, cls_id, x, y, w, h = first_box
cv2.rectangle(img, (int(x), int(y)), (int(x+w), int(y+h)), (0, 255, 0), 2)

print(f"Loaded {len(boxes)} annotations for sequence")
cv2.imshow('PEOD Example', img)
cv2.waitKey(0)

Data Formats: Event data is provided in both RAW and DAT formats, while annotations are available in NumPy format for easy integration with existing workflows.

Dataset Download

The PEOD dataset will be publicly released upon paper acceptance

Coming Soon

Dataset will be released publicly upon paper acceptance

Multiple Formats

Event data in RAW and DAT formats, annotations in NumPy format

Easy Access

Download links and documentation will be provided here

Code & Tools

Evaluation scripts and baseline implementations included

Citation

If you use PEOD in your research, please cite our paper (currently under review)

Status: Paper submitted to AAAI 2026 (currently under peer review)

@article{PEOD2026,
  title     = {PEOD: Pixel-aligned High-Resolution Event-RGB Dataset for Challenging Object Detection},
  author    = {[Authors to be revealed upon acceptance]},
  journal   = {Proceedings of the AAAI Conference on Artificial Intelligence},
  year      = {2026},
  note      = {Under review}
}

Full citation details will be updated upon paper acceptance