Pixel-aligned Event-RGB Object Detection under Challenging Scenarios
The first large-scale multimodal benchmark providing synchronized high-resolution event streams and RGB images for object detection under extreme conditions
PEOD addresses critical limitations in existing Event-RGB datasets: sparse coverage of extreme conditions and low spatial resolution (≤640×480)
of data captured under challenging conditions (low-light, overexposure, high-speed motion)
True pixel-level spatial alignment and microsecond-level temporal synchronization using beam-splitter optical system
Urban, suburban, and tunnel environments with 60% data under extreme conditions (low-light, overexposed, high-velocity)
Meticulously verified annotations for car, bus, truck, two-wheeler, three-wheeler, and person
Hardware signal generator ensures microsecond-level synchronization between RGB and event cameras
Split | Sequences | Frames | Bounding Boxes | Conditions |
---|---|---|---|---|
Training | ~85 | ~57,000 | 270k | Diverse illumination & motion conditions |
Test | ~35 | ~15,000 | 70k | Held-out sequences for benchmarking |
Total | 120+ | 72k | 340k | Complete dataset coverage |
Representative examples demonstrating the limitations of conventional cameras and the advantages of event-based sensing
Event cameras maintain object visibility when RGB sensors saturate under bright lighting conditions
High temporal resolution eliminates motion blur artifacts that degrade conventional camera performance
Superior performance in extreme low-light where RGB cameras fail to capture meaningful information
Our custom coaxial optical system ensures pixel-perfect alignment and precise synchronization
Single square-wave signal generator provides hardware trigger pulses to both cameras for microsecond-level accuracy
Shared optical path enables true pixel-level spatial alignment through standard stereo rectification
Both cameras use identical lenses and fixed aperture settings to eliminate focal length and distortion discrepancies
Comprehensive analysis of data distribution and sample annotations from challenging scenarios
Quantitative classification using under-exposure (S_LL) and over-exposure (S_OE) scores based on pixel saturation
Extensive coverage of challenging scenarios where conventional cameras fail but event cameras excel
High dynamic range scenes demonstrating the >87dB HDR advantage of event cameras over RGB sensors
Comprehensive evaluation of RGB-only, Event-only, and Event+RGB fusion detectors
Modality | Method | mAP50:95 | mAP50 | mAP75 | Params(M) | Notes |
---|---|---|---|---|---|---|
Event-only | SMamba | 22.9% | 43.8% | 19.9% | 23.7 | State-space model; highest event accuracy |
Event-only | RVT | 17.6% | 31.9% | 17.0% | 4.4 | Transformer-based approach |
RGB-only | YOLOv8 | 19.3% | 31.8% | 18.8% | 11.1 | Best performing frame-based baseline |
RGB-only | YOLOX | 17.4% | 37.0% | 14.2% | 8.9 | Strong single-stage detector |
Fusion | EOLO | 24.2% | 40.3% | 26.1% | 46.2 | SNN + CSPDN fusion; top performer |
Fusion | RENet | 22.5% | 39.6% | 21.3% | 37.7 | ResNet-101 based fusion |
Key Finding: Fusion models consistently outperform single-modality baselines, with EOLO achieving 24.2% mAP50:95, surpassing the best Event-only model by +1.3pp and the best RGB baseline by +4.9pp.
Event-only methods excel with SMamba achieving 19.8% mAP, significantly outperforming RGB-only (13.0%) and fusion methods
Fusion methods dominate with EOLO reaching 43.2% mAP, exceeding RGB-only by +8.5pp and Event-only by +22.5pp
Simple Python code to get started with the PEOD dataset
import numpy as np
from pathlib import Path
import cv2
# Paths to frames, events and annotation file
frame_dir = Path('PEOD/train/sequence_001/rgb')
event_file = Path('PEOD/train/sequence_001/events.dat')
anno_file = Path('PEOD/train/sequence_001/boxes.npy')
# Load one RGB frame
img = cv2.imread(str(frame_dir / '000000.png'))
# Load bounding boxes (N × 5: frame_idx, class_id, x, y, w, h)
boxes = np.load(anno_file, allow_pickle=True)
# Draw first box on the first frame
first_box = boxes[0]
frame_idx, cls_id, x, y, w, h = first_box
cv2.rectangle(img, (int(x), int(y)), (int(x+w), int(y+h)), (0, 255, 0), 2)
print(f"Loaded {len(boxes)} annotations for sequence")
cv2.imshow('PEOD Example', img)
cv2.waitKey(0)
Data Formats: Event data is provided in both RAW and DAT formats, while annotations are available in NumPy format for easy integration with existing workflows.
The PEOD dataset will be publicly released upon paper acceptance
Dataset will be released publicly upon paper acceptance
Event data in RAW and DAT formats, annotations in NumPy format
Download links and documentation will be provided here
Evaluation scripts and baseline implementations included
If you use PEOD in your research, please cite our paper (currently under review)
Status: Paper submitted to AAAI 2026 (currently under peer review)
@article{PEOD2026,
title = {PEOD: Pixel-aligned High-Resolution Event-RGB Dataset for Challenging Object Detection},
author = {[Authors to be revealed upon acceptance]},
journal = {Proceedings of the AAAI Conference on Artificial Intelligence},
year = {2026},
note = {Under review}
}
Full citation details will be updated upon paper acceptance