Raw Image Data
Color + Spatial Info
Raw 3D Points
3D Geometric Info
Corners (N × 8 × 3)
Ground Truth
Load rgb.jpg
BGR to RGB
Resize to 480×608
Augment (Train): Flip, Rotate, Crop
Mean: [0.485, 0.456, 0.406]
Std: [0.229, 0.224, 0.225]
Load pc.npy
Reshape to (N, 3)
Remove NaN, z > 0
Valid Points Only
Max 8192 Points
Random Sample or Zero-Pad
Load bbox3d.npy
(N × 8 × 3)
Corners to (center, size, quat)
PCA-Based Fitting
Max 21 Objects
Zero-Pad or Truncate
Pre-trained on ImageNet
1536D Features
1536D → 512D
Linear + ReLU + Dropout
512-dimensional
Semantic Features
Dynamic Graph CNN
k=20 neighbors
3 Layers: [64, 64, 64]
Graph Feature Learning
Max Pool → 1024D
Permutation Invariant
1024D → 512D
Linear + ReLU + Dropout
512-dimensional
Geometric Features
Combining RGB and Point Cloud Features
(x, y, z, w, h, l, q_w, q_x, q_y, q_z)
Center, Size, Quaternion
Objectness Probability
Sigmoid Output