Format

Directory Structure

The directory structure of the CoSEC dataset is shown below:

CoSEC/
├── Train/
│   ├── Day/
│   │   ├── Campus/
│   │   │   ├── 000/
│   │   │   │   ├── img_co_left/
│   │   │   │   ├── img_co_right/
│   │   │   │   ├── depth_co/
│   │   │   │   ├── ...
│   │   │   │   ├── events_co_left.h5
│   │   │   │   ├── events_co_right.h5
│   │   │   │   ├── intrinsics.json
│   │   │   │   ├── extrinsics.json
│   │   │   │   └── timestamps.txt
│   │   │   ├── 001/
│   │   │   └── ...
│   │   ├── City/
│   │   │   └── ...
│   │   ├── Park/
│   │   │   └── ...
│   │   ├── Suburbs/
│   │   │   └── ...
│   │   └── Village/
│   │       └── ...
│   └── Night/
│       ├── Campus/
│       │   └── ...
│       └── ...
└── Test/
    └── ...

Note that not all sequences contain all types of labels, and the test set does not include any labels. Please use the filters on the Download page to find sequences available for specific tasks.

Images

The img_co_left/ and img_co_right/ folders contain left and right color image sequences.

img_co_left/
├── 000000.png
├── 000001.png
├── 000002.png
└── ...

img_co_right/
├── 000000.png
├── 000001.png
├── 000002.png
└── ...

Images are stored as .png files and named by frame index. Both left and right images have a resolution of 1200 × 624 pixels.

Files with the same index in img_co_left/ and img_co_right/ correspond to each other.

For example:

img_co_left/000010.png
img_co_right/000010.png

are the corresponding left and right images.

Events

Each sequence contains event streams from the left and right event cameras.

events_co_left.h5
events_co_right.h5

The event streams are stored in HDF5 format and have been aligned with the image frames.

A typical event file contains the following fields:

Field	Description
`x`	Event x coordinates
`y`	Event y coordinates
`p`	Event polarities
`t`	Event timestamps
`ms_to_idx`	Mapping from millisecond timestamp to event index

The event timestamps t are stored in microseconds. The ms_to_idx array can be used to quickly locate event indices by millisecond-level timestamps.

For a given image frame, users can use the corresponding timestamp in timestamps.txt to extract a temporal window of events. For example, the events within 50 ms before the image timestamp can be used together with the image frame.

Example code:

import os
import cv2
import h5py
import numpy as np

seq_dir = "path/to/sequence"

frame_id = 0
event_window_ms = 50

# Load image timestamp.
timestamps = np.loadtxt(os.path.join(seq_dir, "timestamps.txt"), dtype=np.int64)
img_ts_us = timestamps[frame_id]

# Load image frame.
img_path = os.path.join(seq_dir, "img_co_left", f"{frame_id:06d}.png")
img = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)

# Load event file.
event_path = os.path.join(seq_dir, "events_co_left.h5")
with h5py.File(event_path, "r") as events:
    ms_to_idx = events["ms_to_idx"][:]

    # Select events within [img_ts_us - 50 ms, img_ts_us].
    start_us = img_ts_us - event_window_ms * 1000
    end_us = img_ts_us

    start_ms = int(round(start_us / 1000.0))
    end_ms = int(round(end_us / 1000.0))

    start_idx = ms_to_idx[start_ms]
    end_idx = ms_to_idx[end_ms]

    xs = events["x"][start_idx:end_idx]
    ys = events["y"][start_idx:end_idx]
    ps = events["p"][start_idx:end_idx]
    ts = events["t"][start_idx:end_idx]

# img and the selected event segment can then be used together.

Calibration

Each sequence provides two calibration files:

intrinsics.json
extrinsics.json

Intrinsics

Camera intrinsic parameters are stored in intrinsics.json.

Since the released images and event streams are already aligned and rectified, users should use the Co_Rect_L and Co_Rect_R parameters for the left and right data.

The format is:

"Co_Rect_L": {
    "K": [
        [fx,  0, cx],
        [ 0, fy, cy],
        [ 0,  0,  1]
    ],
    "resolution": [height, width]
}

Here, K is the camera intrinsic matrix. fx and fy are the focal lengths, and cx and cy are the principal point coordinates.

The resolution field is given in [height, width] order. For the released rectified image and event data, the resolution is:

[624, 1200]

The left and right rectified streams use:

Co_Rect_L
Co_Rect_R

Extrinsics

Extrinsic parameters are stored in extrinsics.json.

For the released aligned data, the stereo relationship between the right and left co-axial systems is provided by:

Co_R_to_Co_L

The format is:

"Co_R_to_Co_L": {
    "R": [
        [r11, r12, r13],
        [r21, r22, r23],
        [r31, r32, r33]
    ],
    "T": [
        [tx],
        [ty],
        [tz]
    ]
}

Here, R is the rotation matrix and T is the translation vector from the right co-axial system to the left co-axial system.

Other entries in extrinsics.json provide transformations between the original camera, event camera, rectified, and aligned coordinate systems.

Timestamps

Each sequence provides a timestamps.txt file. Each line stores the timestamp of one image frame pair, and the order is consistent with the frame order in img_co_left/ and img_co_right/.

The timestamp can be regarded as the exposure end timestamp of the corresponding left and right image pair.

For example:

img_co_left/000000.png   img_co_right/000000.png   ->   1st line in timestamps.txt
img_co_left/000001.png   img_co_right/000001.png   ->   2nd line in timestamps.txt
img_co_left/000002.png   img_co_right/000002.png   ->   3rd line in timestamps.txt

For the training set, the depth maps in depth_co/ share the same timestamps as the corresponding image frame pairs.

The timestamps are stored as integer values in microseconds. They can be used to associate each image pair or depth sample with a corresponding segment of the event stream.

Labels

Depth

Ground-truth depth maps are stored as .png files in the depth_co/ folder.

depth_co/
├── 000000.png
├── 000001.png
├── 000002.png
└── ...

Depth labels are only provided for the training set.

Depth maps are named by frame index and correspond to the image frame pairs with the same index.

For example:

depth_co/000010.png
img_co_left/000010.png
img_co_right/000010.png

are corresponding samples.

The depth values are stored as uint16 scaled depth. If the metric depth is represented in meters, the stored value is:

stored_depth = metric_depth_in_meters * 256

To recover the metric depth in meters:

metric_depth_in_meters = stored_depth / 256.0

Segmentation

Segmentation annotations are provided in JSON format in the segment_co/ folder.

segment_co/
├── 000000.json
├── 000001.json
├── 000002.json
└── ...

Segmentation labels are only provided for the training set.

Each JSON file is named by frame index and corresponds to the image frame pair with the same index.

For example:

segment_co/000010.json
img_co_left/000010.png
img_co_right/000010.png

are corresponding samples.

The segmentation annotations are stored as polygon-based JSON files rather than rasterized label maps. Each JSON file contains image-level metadata and a list of annotated polygon regions.

A typical annotation file follows the structure below:

{
  "version": "2.4.0",
  "imagePath": "000000.png",
  "imageWidth": 1200,
  "imageHeight": 624,
  "shapes": [
    {
      "label": "road",
      "points": [
        [610.0, 211.3],
        [605.7, 215.4],
        [600.5, 221.7]
      ],
      "group_id": null,
      "description": "",
      "shape_type": "polygon"
    },
    {
      "label": "car",
      "points": [
        [10.1, 257.7],
        [20.3, 258.4],
        [23.6, 302.6]
      ],
      "group_id": 0,
      "description": "stationary",
      "shape_type": "polygon"
    },
    {
      "label": "rider",
      "points": [
        [1038.2, 265.5],
        [1034.0, 270.1],
        [1022.1, 263.7]
      ],
      "group_id": 2,
      "description": "moving",
      "shape_type": "polygon"
    }
  ]
}

The main fields are:

Field	Description
`version`	Annotation format version
`imagePath`	Name of the corresponding image file
`imageWidth`	Image width in pixels
`imageHeight`	Image height in pixels
`shapes`	List of annotated polygon regions
`label`	Semantic category of the annotated region
`points`	Polygon vertices in image coordinates
`group_id`	Optional instance or association ID
`description`	Optional object state, such as `moving` or `stationary`
`shape_type`	Shape type, usually `polygon`

The coordinate system follows the image coordinate convention. The origin is located at the top-left corner of the image, the x-axis points to the right, and the y-axis points downward.

For example, a point:

[x, y] = [610.0, 211.3]

means that the vertex is located at pixel coordinate x = 610.0, y = 211.3 in the image.

When converting the polygon annotations into pixel-wise semantic label maps, please use the following class ID order:

Class ID	Class Name
0	`road`
1	`sidewalk`
2	`building`
3	`wall`
4	`fence`
5	`pole`
6	`traffic light`
7	`traffic sign`
8	`vegetation`
9	`terrain`
10	`sky`
11	`person`
12	`rider`
13	`car`
14	`truck`
15	`bus`
16	`train`
17	`motorcycle`
18	`bicycle`

In addition to the semantic classes listed above, the JSON annotations may also contain the following extra labels:

Label	Description
`ignore`	Regions that should be ignored during training or evaluation
`static`	Static regions or objects not assigned to one of the official semantic classes
`dynamic`	Dynamic regions or objects not assigned to one of the official semantic classes

These extra labels are not part of the official 19-class semantic ID order. When generating training or evaluation masks, pixels annotated as ignore should be assigned the ignore label. We recommend using 255 as the ignore label.

For the static and dynamic labels, users may either keep them as auxiliary labels for custom experiments or map them to the ignore label for standard 19-class semantic segmentation.

Some dynamic objects contain additional state information in the description field, such as:

moving
stationary

The description field is not part of the semantic class ID. For semantic segmentation, objects with the same label should be mapped to the same class ID regardless of their motion state.

The group_id field can be used to associate related instances. For example, a rider and the corresponding motorcycle or bicycle may share related instance/group information, which can be useful for instance-level or object-level analysis. For semantic segmentation, the group_id field is optional and does not affect the semantic class ID.

Users can convert the polygon annotations into pixel-wise semantic label maps according to their own training or evaluation pipeline. During conversion, each polygon should be rasterized to the corresponding class index.

The segmentation annotations are aligned with the released rectified color images and event streams. Therefore, the segmentation label of frame 000010 can be used together with:

img_co_left/000010.png
img_co_right/000010.png
events_co_left.h5
events_co_right.h5
timestamps.txt

The timestamp of the corresponding frame can be found from the same line index in timestamps.txt.