Humans use multimodal sensory information to understand the physical properties of their environment. Intelligent decision-making systems such as the ones used in robotic applications could also utilize the fusion of multimodal information to improve their performance and reliability. In recent years, machine learning and deep learning methods are used at the heart of such intelligent systems. Developing visuo-tactile models is a challenging task due to various problems such as performance, limited datasets, reliability, and computational efficiency. In this research, we propose four efficient models based on dynamic neural network architectures for unimodal and multimodal object recognition. For unimodal object recognition, TactileNet and VisionNet are proposed. For multimodal object recognition, the FusionNet-A and the FusionNet-B are designed to implement early and late fusion strategies, respectively. The proposed models have a flexible structure and are able to change at the train or test phase to accommodate the amount of available information. Model confidence calibration is employed to enhance the reliability and generalization of the models. The proposed models are evaluated on MIT CSAIL large-scale multimodal dataset. Our results demonstrate accurate performance in both unimodal and multimodal scenarios. It has been illustrated that by using different fusion strategies and augmenting the tactile-based models with visual information, the top-1 error rate of the single-frame tactile model was reduced by 78% and the mean average precision was increased by 2.19 times. Although the focus has been on the fusion of tactile and visual modalities, the proposed design methodology can be generalized to include more modalities.

Fusion of tactile and visual information in deep learning models for object recognition

Amiri M.
;
Falotico E.
Ultimo
2023-01-01

Abstract

Humans use multimodal sensory information to understand the physical properties of their environment. Intelligent decision-making systems such as the ones used in robotic applications could also utilize the fusion of multimodal information to improve their performance and reliability. In recent years, machine learning and deep learning methods are used at the heart of such intelligent systems. Developing visuo-tactile models is a challenging task due to various problems such as performance, limited datasets, reliability, and computational efficiency. In this research, we propose four efficient models based on dynamic neural network architectures for unimodal and multimodal object recognition. For unimodal object recognition, TactileNet and VisionNet are proposed. For multimodal object recognition, the FusionNet-A and the FusionNet-B are designed to implement early and late fusion strategies, respectively. The proposed models have a flexible structure and are able to change at the train or test phase to accommodate the amount of available information. Model confidence calibration is employed to enhance the reliability and generalization of the models. The proposed models are evaluated on MIT CSAIL large-scale multimodal dataset. Our results demonstrate accurate performance in both unimodal and multimodal scenarios. It has been illustrated that by using different fusion strategies and augmenting the tactile-based models with visual information, the top-1 error rate of the single-frame tactile model was reduced by 78% and the mean average precision was increased by 2.19 times. Although the focus has been on the fusion of tactile and visual modalities, the proposed design methodology can be generalized to include more modalities.
2023
File in questo prodotto:
File Dimensione Formato  
Fusion of tactile and visual information in deep learning models for object recognition.pdf

non disponibili

Tipologia: PDF Editoriale
Licenza: Copyright dell'editore
Dimensione 3.71 MB
Formato Adobe PDF
3.71 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11382/573532
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
social impact