publications | Guanchen DING

2024

ACMMM2024

Domain-Agnostic Crowd Counting via Uncertainty-Guided Style Diversity Augmentation.

Guanchen Ding, Lingbo Liu, Zhenzhong Chen, and Changwen Chen

In Proceedings of ACM International Conference on Multimedia (ACM MM), 2024

@inproceedings{Ding_2024_ACMMM,
  author = {Ding, Guanchen and Liu, Lingbo and Chen, Zhenzhong and Chen, Changwen},
  booktitle = {Proceedings of ACM International Conference on Multimedia (ACM MM)},
  title = {Domain-Agnostic Crowd Counting via Uncertainty-Guided Style Diversity Augmentation.},
  year = {2024},
  volume = {},
  number = {},
  pages = {},
  doi = {}
}

ICASSP
Towards Omniscient Feature Alignment for Video Rescaling

Guanchen Ding, and Changwen Chen

In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024

Abs Bib HTML

Video super-resolution often reconstructs high-resolution (HR) video from low-resolution (LR) video that has been downsampled using predefined methods, which is an ill-posedness problem. Recent video rescaling algorithms alleviate this problem by jointly training the downsampling and upsampling processes. However, they primarily exploit the shallow temporal correlations among video frames, overlooking the intricate, long-term sequential depth dependencies within the video. In this paper, we propose an omniscient feature alignment to leverage the bidirectional deep temporal information for video rescaling, namely OFA-VRN. In the downsampling phase, the proposed method separates the input HR video into LR frames and high-frequency components using haar wavelet transform and explicitly embeds the high-frequency components into the LR frames. In this way, detailed information is stored in the frame and maintains visual perception quality in downsampled videos. During the upsampling phase, we use an advanced bidirectional propagation paradigm to enhance temporal information aggregation capabilities. By incorporating the proposed omniscient feature alignment, the network is capable of leveraging multi-frame feature information from the triplet dimension to further alleviate misalignment issues, thereby enhancing its capacity for deep temporal information utilization. The experiments on Vid4 and Vimeo90K-T demonstrate that our model achieves competitive performance compared to the state-of-the-art methods.
@inproceedings{Ding_2024_ICASSP, author = {Ding, Guanchen and Chen, Changwen}, booktitle = {Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title = {Towards Omniscient Feature Alignment for Video Rescaling}, year = {2024}, volume = {}, number = {}, pages = {}, doi = {10.1109/ICASSP48485.2024.10448113} }

2023

TGRS
DOPNet: Dense Object Prediction Network for Multi-Class Object Counting and Localization in Remote Sensing Images

Mingpeng Cui, Guanchen Ding, Daiqin Yang, and Zhenzhong Chen

IEEE Transactions on Geoscience and Remote Sensing, 2023

Abs Bib HTML

Object counting and localization for remote sensing images are effective means to solve large-scale object analysis problems. Nowadays, most counting methods obtain the number of objects by employing convolutional neural network to regress a density map of objects. Even if these leading methods have achieved impressive performances, they simply focus on estimating the number of single-class objects, without providing location information and cannot support multi-class objects. To tackle these problems, a point-based network named Dense Object Prediction Network (DOPNet) is proposed for multi-class object counting and localization for remote sensing images. DOPNet differs from the conventional approach of predicting multiple density maps by incorporating category attributes into the predicted objects, enabling the accurate counting and localization of multi-class objects. Specifically, DOPNet adopts a multi-scale architecture to provide dense predictions of object proposals. A Scale Adaptive Feature Enhancement Module (SAFEM) is designed to predict scales of objects for the suppression of duplicate proposals. Given only point level annotations for training, a pseudo box generation algorithm is designed to find the most suitable pseudo box of each annotated object for the supervision of scale learning. Comprehensive experiments prove that DOPNet can achieve preferable performance on challenging benchmarks of counting while providing object locations. Code and pre-trained models are available at https://github.com/Ceoilmp/DOPNet.
@article{Cui_2023_TGRS, author = {Cui, Mingpeng and Ding, Guanchen and Yang, Daiqin and Chen, Zhenzhong}, journal = {IEEE Transactions on Geoscience and Remote Sensing}, title = {DOPNet: Dense Object Prediction Network for Multi-Class Object Counting and Localization in Remote Sensing Images}, year = {2023}, volume = {}, number = {}, pages = {}, doi = {} }

2022

TGRS
Object Counting for Remote-Sensing Images via Adaptive Density Map-Assisted Learning

Guanchen Ding, Mingpeng Cui, Daiqin Yang, Tao Wang, Sihan Wang, and Yunfei Zhang

IEEE Transactions on Geoscience and Remote Sensing, 2022

Abs Bib HTML

Object counting has attracted a lot of attention in remote sensing image analysis. In density map based object counting algorithms, the ground truth density maps generated by fix-sized Gaussian kernels ignore the spatial features of the objects. In this paper, an Adaptive Density Map Assisted Learning algorithm (ADMAL) is proposed, which taps into spatial features of the objects from the beginning phase of ground truth density map generation. ADMAL consists of two networks: a Contexture Aware Density Map Generation (CADMG) network and a Transformer-based Density Map Estimation (TDME) network. The CADMG network is designed to generate a ground truth density map from each annotated point map. Comparing with Gaussian convolved density maps, the ground truth density maps generated by CADMG will be tailored according to the texture and neighborhood relationship among objects, which can promote the learning effect of the TDME network. TDME is the core network for object counting. The backbone of the TDME network adopts a Swin transformer structure, the self-attention mechanism of which possesses a larger receptive field for effective feature extraction in remote sensing images. Comprehensive experiments prove that the ground truth density map generated by CADMG can help various density map estimation networks achieve better training effects, among which TDME achieves the best performances. Moreover, the ADMAL algorithm can achieve preferable object counting performances for both satellite-based image and drone-based image. Code and pre-trained models are available at https://github.com/gcding/ADMAL-pytorch.
@article{Ding_2022_TGRS, author = {Ding, Guanchen and Cui, Mingpeng and Yang, Daiqin and Wang, Tao and Wang, Sihan and Zhang, Yunfei}, journal = {IEEE Transactions on Geoscience and Remote Sensing}, title = {Object Counting for Remote-Sensing Images via Adaptive Density Map-Assisted Learning}, year = {2022}, volume = {60}, number = {}, pages = {1--11}, doi = {10.1109/TGRS.2022.3208326} }
TMM
Crowd counting via unsupervised cross-domain feature adaptation

Guanchen Ding, Daiqin Yang, Tao Wang, Sihan Wang, and Yunfei Zhang

IEEE Transactions on Multimedia, 2022

Abs Bib HTML

Given an image, crowd counting aims to estimate the amount of target objects in the image. With un-predictable installation situations of surveillance systems (or other equipment), crowd counting images from different data sets may exhibit severe discrepancies in viewing angle, scale, lighting condition, etc. As it is usually expensive and time-consuming to annotate each data set for model training, it has been an essential issue in crowd counting to transfer a well-trained model on a labeled data set (source domain) to a new data set (target domain). To tackle this problem, we propose a cross-domain learning network to learn the domain gaps in an unsupervised learning manner. The proposed network comprises of a Multi-granularity Feature-aware Discriminator (MFD) module, a Domain-Invariant Feature Adaptation (DFA) module, and a Cross-domain Vanishing Bridge (CVB) module to remove domain-specific information from the extracted features and promote the mapping performances of the network. Unlike most existing methods that use only Global Feature Discriminator (GFD) to align features at image level, an additional Local Feature Discriminator (LFD) is inserted and together with GFD form the MFD module. As a complement to MFD, LFD refines features at pixel level and has the ability to align local features. The DFA module explicitly measures the distances between the source domain features and the target domain features and aligns the marginal distribution of their features with Maximum Mean Discrepancy (MMD). Finally, the CVB module provides an incremental capability of removing the impact of interfering part of the extracted features. Several well-known networks are adopted as the backbone of our algorithm to prove the effectiveness of the proposed adaptation structure. Comprehensive experiments demonstrate that our model achieves competitive performance to the state-of-the-art methods. Code and pre-trained models are available at https://github.com/gcding/CDFA-pytorch.
@article{Ding_2022_TMM, author = {Ding, Guanchen and Yang, Daiqin and Wang, Tao and Wang, Sihan and Zhang, Yunfei}, journal = {IEEE Transactions on Multimedia}, title = {Crowd counting via unsupervised cross-domain feature adaptation}, year = {2022}, volume = {}, number = {}, pages = {1-1}, doi = {10.1109/TMM.2022.3180222} }
CVPRW
A Coarse-To-Fine Boundary Localization Method for Naturalistic Driving Action Recognition

Guanchen Ding *, Wenwei Han *, Chenglong Wang *, Mingpeng Cui, Lin Zhou, Dianbo Pan, Jiayi Wang, Junxi Zhang, and Zhenzhong Chen

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Jun 2022

Abs Bib HTML PDF

Naturalistic driving action recognition plays an important role in understanding drivers’ distraction behavior in the traffic environment. The main challenge of this task is the accurate localization of the temporal boundary for each distraction driving behavior in the video. Although many temporal action localization methods can identify action classes, it is difficult to predict accurate temporal boundaries for this task since the driving actions of the same category usually present large intra-class variation. In this paper, we introduce a Coarse-to-Fine Boundary Localization method called CFBL, which obtains fine-grained temporal boundaries progressively through three stages. Concretely, in the first coarse boundary generation stage, we adopt a modified anchor-free model Anchor-Free Saliency-based Detector (AFSD) to make an interval estimation of the temporal boundaries of distraction behavior. In the second boundary refinement stage, we use the Dense Boundary Generation (DBG) model to adjust the estimated interval of the temporal boundaries. In the final boundary decision stage, we build a Localization Boundary Refinement Module to determine the final boundaries of different actions. Besides, we adopt a voting strategy to combine the results of different camera views to enhance the model’s distraction driving action classification ability. The experiments conducted on the Track 3 validation set of the 2022 AI City Challenge demonstrate competitive performance of the proposed method.
@inproceedings{Ding_2022_CVPR, author = {Ding *, Guanchen and Han *, Wenwei and Wang *, Chenglong and Cui, Mingpeng and Zhou, Lin and Pan, Dianbo and Wang, Jiayi and Zhang, Junxi and Chen, Zhenzhong}, title = {A Coarse-To-Fine Boundary Localization Method for Naturalistic Driving Action Recognition}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = jun, year = {2022}, pages = {3234-3241} }

ICPR

The First Challenge on Moving Object Detection and Tracking in Satellite Videos: Methods and Results

Yulan Guo, Qian Yin, Qingyong Hu, Feng Zhang, Chao Xiao, Ye Zhang, Hanyun Wang, Chenguang Dai, Jian Yang, Zhuang Zhou, and 26 more authors

In 26th International Conference on Pattern Recognition, ICPR 2022, Montreal, QC, Canada, August 21-25, 2022, Jun 2022

Abs Bib HTML

In this paper, we briefly summarize the first challenge on moving object detection and tracking in satellite videos (SatVideoDT). This challenge has three tracks related to satellite video analysis, including moving object detection (Track 1), single object tracking (Track 2), and multiple-object tracking (Track 3). 123, 89, and 70 participants successfully registered, while 37, 42, and 29 teams submitted their final results on the test datasets for Tracks 1-3, respectively. The top-performing methods and their results in each track are described with details. This challenge establishes a new benchmark for satellite video analysis.

@inproceedings{Guo_2022_ICPR,
  author = {Guo, Yulan and Yin, Qian and Hu, Qingyong and Zhang, Feng and Xiao, Chao and Zhang, Ye and Wang, Hanyun and Dai, Chenguang and Yang, Jian and Zhou, Zhuang and Guo, Weilong and Qi, Xiyu and Tu, Kelong and Xu, Cong and Zhu, Shudan and Chen, Lai and Lin, Bin and Xue, Chaocan and Zheng, Jinlei and Qin, Limei and Li, Ying and Zhao, Manqi and Ruan, Lu and Cui, Mingpeng and Ding, Guanchen and Jiang, Guangwei and Chen, Zhenzhong and Sun, Yuhan and Cao, Kaiyang and Kong, Lingyu and Chen, Shaodong and Zhao, Zhicheng and Shen, Qin and Liu, Lei and Li, Chenglong and Xiao, Yun},
  title = {The First Challenge on Moving Object Detection and Tracking in Satellite
                 Videos: Methods and Results},
  booktitle = {26th International Conference on Pattern Recognition, {ICPR} 2022,
                 Montreal, QC, Canada, August 21-25, 2022},
  pages = {4981--4988},
  publisher = {{IEEE}},
  year = {2022},
  doi = {10.1109/ICPR56361.2022.9956153},
  timestamp = {Thu, 01 Dec 2022 15:50:19 +0100},
  biburl = {https://dblp.org/rec/conf/icpr/GuoYHZXZWDYZGQT22.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

ECCVW

Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 Challenge: Report

Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, Jingang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, and 86 more authors

In Computer Vision - ECCV 2022 Workshops - Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part III, Jun 2022

Abs Bib HTML

Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.

@inproceedings{Ignatov_2022_ECCVW,
  author = {Ignatov, Andrey and Timofte, Radu and Denna, Maurizio and Younes, Abdel and Gankhuyag, Ganzorig and Huh, Jingang and Kim, Myeong Kyun and Yoon, Kihwan and Moon, Hyeon{-}Cheol and Lee, Seungho and Choe, Yoonsik and Jeong, Jinwoo and Kim, Sungjei and Smyl, Maciej and Latkowski, Tomasz and Kubik, Pawel and Sokolski, Michal and Ma, Yujie and Chao, Jiahao and Zhou, Zhou and Gao, Hongfan and Yang, Zhengfeng and Zeng, Zhenbing and Zhuge, Zhengyang and Li, Chenghua and Zhu, Dan and Sun, Mengdi and Duan, Ran and Gao, Yan and Kong, Lingshun and Sun, Long and Li, Xiang and Zhang, Xingdong and Zhang, Jiawei and Wu, Yaqi and Pan, Jinshan and Yu, Gaocheng and Zhang, Jin and Zhang, Feng and Ma, Zhe and Wang, Hongbin and Cho, Hojin and Kim, Steve and Li, Huaen and Ma, Yanbo and Luo, Ziwei and Li, Youwei and Yu, Lei and Wen, Zhihong and Wu, Qi and Fan, Haoqiang and Liu, Shuaicheng and Zhang, Lize and Zong, Zhikai and Kwon, Jeremy and Zhang, Junxi and Li, Mengyuan and Fu, Nianxiang and Ding, Guanchen and Zhu, Han and Chen, Zhenzhong and Li, Gen and Zhang, Yuanfan and Sun, Lei and Zhang, Dafeng and Yang, Neo and Liu, Fitz and Zhao, Jerry and Ayazoglu, Mustafa and Bilecen, Bahri Batuhan and Hirose, Shota and Arunruangsirilert, Kasidis and Ao, Luo and Leung, Ho Chun and Wei, Andrew and Liu, Jie and Liu, Qiang and Yu, Dahai and Li, Ao and Luo, Lei and Zhu, Ce and Hong, Seongmin and Park, Dongwon and Lee, Joonhee and Lee, Byeong Hyun and Lee, Seunggyu and Chun, Se Young and He, Ruiyuan and Jiang, Xuhao and Ruan, Haihang and Zhang, Xinjian and Liu, Jing and Gendy, Garas and Sabor, Nabil and Hou, Jingchao and He, Guanghui},
  editor = {Karlinsky, Leonid and Michaeli, Tomer and Nishino, Ko},
  title = {Efficient and Accurate Quantized Image Super-Resolution on Mobile
                 NPUs, Mobile {AI} {\&} {AIM} 2022 Challenge: Report},
  booktitle = {Computer Vision - {ECCV} 2022 Workshops - Tel Aviv, Israel, October
                 23-27, 2022, Proceedings, Part {III}},
  series = {Lecture Notes in Computer Science},
  volume = {13803},
  pages = {92--129},
  publisher = {Springer},
  year = {2022},
  url = {https://doi.org/10.1007/978-3-031-25066-8\_5},
  doi = {10.1007/978-3-031-25066-8\_5},
  timestamp = {Wed, 22 Feb 2023 09:57:53 +0100},
  biburl = {https://dblp.org/rec/conf/eccv/IgnatovTDYGHKYMLCJKSLKSMCZGYZZLZSDGKSL22.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

2021

ICCVW
VisDrone-CC2021: The Vision Meets Drone Crowd Counting Challenge Results

Zhihao Liu, Zhijian He, Lujia Wang, Wenguan Wang, Yixuan Yuan, Dingwen Zhang, Jinglin Zhang, Pengfei Zhu, Luc Van Gool, Junwei Han, and 29 more authors

In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, Oct 2021

Abs Bib HTML

Crowding counting research evolves quickly by the leverage of development in deep learning. Many researchers put their efforts into crowd counting tasks and have achieved many significant improvements. However, current datasets still barely satisfy this evolution and high quality evaluation data is urgent. Motivated by high quality and quantity study in crowding counting, we collect a drone-captured dataset formed by 5,468 images(images in RGB and thermal appear in pairs and 2,734 respectively). There are 1,807 pairs of images for training, and 927 pairs for testing. We manually annotate persons with points in each frame. Based on this dataset, we organized the Vision Meets Drone Crowd Counting Challenge(Visdrone-CC2021) in conjunction with the International Conference on Computer Vision (ICCV 2021). Our challenge attracts many researchers to join, which pave the road of speed up the milestone in crowding counting. To summarize the competition, we select the most remarkable algorithms from participants’ submissions and provide a detailed analysis of the evaluation results. More information can be found at the website: http://www.aiskyeye.com/.
@inproceedings{Liu_2021_ICCV, author = {Liu, Zhihao and He, Zhijian and Wang, Lujia and Wang, Wenguan and Yuan, Yixuan and Zhang, Dingwen and Zhang, Jinglin and Zhu, Pengfei and Gool, Luc Van and Han, Junwei and Hoi, Steven C. H. and Hu, Qinghua and Liu, Ming and Pan, Junwen and Yin, Baoqun and Zhang, Binyu and Liu, Chengxin and Ding, Ding and Liang, Dingkang and Ding, Guanchen and Lu, Hao and Lin, Hui and Chen, Jingyuan and Li, Jiong and Liu, Liang and Zhou, Lin and Shi, Min and Yang, Qianqian and He, Qing and Peng, Sifan and Xu, Wei and Han, Wenwei and Bai, Xiang and Chen, Xiwu and Wang, Yabin and Xia, Yinfeng and Tao, Yiran and Chen, Zhenzhong and Cao, Zhiguo}, title = {VisDrone-CC2021: The Vision Meets Drone Crowd Counting Challenge Results}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = oct, year = {2021}, pages = {2830-2838} }
CVPRW
Dual-Modality Vehicle Anomaly Detection via Bilateral Trajectory Tracing

Jingyuan Chen *, Guanchen Ding *, Yuchen Yang *, Wenwei Han, Kangmin Xu, Tianyi Gao, Zhe Zhang, Wanping Ouyang, Hao Cai, and Zhenzhong Chen

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Jun 2021

Abs Bib HTML

Traffic anomaly detection has played a crucial role in Intelligent Transportation System (ITS). The main challenges of this task lie in the highly diversified anomaly scenes and variational lighting conditions. Although much work has managed to identify the anomaly in homogenous weather and scene, few resolved to cope with complex ones. In this paper, we proposed a dual-modality modularized methodology for the robust detection of abnormal vehicles. We introduced an integrated anomaly detection framework comprising the following modules: background modeling, vehicle tracking with detection, mask construction, Region of Interest (ROI) backtracking, and dual-modality tracing. Concretely, we employed background modeling to filter the motion information and left the static information for later vehicle detection. For the vehicle detection and tracking module, we adopted YOLOv5 and multi-scale tracking to localize the anomalies. Besides, we utilized the frame difference and tracking results to identify the road and obtain the mask. In addition, we introduced multiple similarity estimation metrics to refine the anomaly period via backtracking. Finally, we proposed a dual-modality bilateral tracing module to refine the time further. The experiments conducted on the Track 4 testset of the NVIDIA 2021 AI City Challenge yielded a result of 0.9302 F1-Score and 3.4039 root mean square error (RMSE), indicating the effectiveness of our framework.
@inproceedings{Chen_2021_CVPR, author = {Chen *, Jingyuan and Ding *, Guanchen and Yang *, Yuchen and Han, Wenwei and Xu, Kangmin and Gao, Tianyi and Zhang, Zhe and Ouyang, Wanping and Cai, Hao and Chen, Zhenzhong}, title = {Dual-Modality Vehicle Anomaly Detection via Bilateral Trajectory Tracing}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = jun, year = {2021}, pages = {4016-4025} }

2020

VCIP
Drone-Based Car Counting via Density Map Learning

Jingxian Huang *, Guanchen Ding *, Yujia Guo, Daiqin Yang, Sihan Wang, Tao Wang, and Yunfei Zhang

In 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP), Jun 2020

Abs Bib HTML

Car counting on drone-based images is a challenging task in computer vision. Most advanced methods for counting are based on density maps. Usually, density maps are first generated by convolving ground truth point maps with a Gaussian kernel for later model learning (generation). Then, the counting network learns to predict density maps from input images (estimation). Most studies focus on the estimation problem while overlooking the generation problem. In this paper, a training framework is proposed to generate density maps by learning and train generation and estimation subnetworks jointly. Experiments demonstrate that our method outperforms other density map-based methods and shows the best performance on drone-based car counting.
@inproceedings{Huang_2020_VCIP, author = {Huang *, Jingxian and Ding *, Guanchen and Guo, Yujia and Yang, Daiqin and Wang, Sihan and Wang, Tao and Zhang, Yunfei}, booktitle = {2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)}, title = {Drone-Based Car Counting via Density Map Learning}, year = {2020}, volume = {}, number = {}, pages = {239-242}, doi = {10.1109/VCIP49819.2020.9301785} }