Digitalizing the data on intersections is essential to promote digital transformation in the field of road traffic management and governance. In this study, we developed semantic segmentation model to automatically determine the intersection components such as roadways, sidewalks, lane segment lines, stop lines, median strips, vehicles, occlusion, and others based on aerial photographs. First, we prepared the annotated data set with 11 components of intersections on aerial photographs with a ground resolution of 5 cm. Then, SegFormer-B5, which is based on transformers, was fine-tuned with our-prepared annotation data based on a pre-trained model. It was found that the SegFormer-B5 could determine the intersection components with high accuracy, while prediction accuracy gets lower for intersections where road marking is poorly managed, is located in industrial area, and has four and more legs. Regarding the robustness of prediction accuracy against the quality of the aerial photographs as input data, it was found that prediction accuracy based on the 10-cm ground resolution was comparable to the 5-cm ground resolution, while the 20-cm ground resolution dramatically deteriorates the prediction accuracy. In addition, it was revealed that the semantic segmentation model should not be applied to the aerial photographs of which the ground resolution was unknown. This suggested the importance of the robustness of the prediction model against the ground resolution, which was recommended for future work.
View full abstract