The effect of time on the automated detection of the pharyngeal phase in videofluoroscopic swallowing studies

Bandini, Andrea; Steele, Catriona M

doi:10.1109/EMBC46164.2021.9629562

: Convolutional Neural Networks (CNNs) have recently been proposed to automatically detect the pharyngeal phase in videofluoroscopic swallowing studies (VFSS). However, there is a lack of consensus regarding the best algorithmic strategy to adopt for segmenting this important yet rapid phase of the swallow. Moreover, additional information is needed to understand how small the detection error should be, in view of translating this approach for use in clinical practice. In this manuscript we compare multiple CNN-based algorithms for detecting the pharyngeal phase in VFSS bolus-level clips, specifically looking at 2DCNN and 3DCNN approaches with different temporal windows as input. Our results showed that a 2DCNN analysis on 3-frame windows outperformed both frame-by-frame approaches and 3DCNNs. We also demonstrated that the detection accuracy of the pharyngeal phase is very close to the clinical gold standard (i.e., trained clinical raters). These results demonstrate the feasibility of deep learning-based algorithms for developing intelligent approaches to automatically support clinicians in the analysis of VFSS data.Clinical relevance- Accurate and reliable segmentation of the pharyngeal phase will support clinicians by reducing the time needed for rating VFSS data. Moreover, automatic detection of this phase can be seen as a foundation for building novel and intelligent approaches to detect clinical features of interest in VFSS, such as the presence of penetration-aspiration.