A "Learned" Approach to Quicken and Compress Rank/Select Dictionaries

IRIS

We address the well-known problem of designing, implementing and experimenting compressed data structures for supporting rank and select queries over a dictionary of integers. This problem has been studied far and wide since the end of the ‘80s with tons of important theoretical and practical results. Following a recent line of research on the so-called learned data structures, we first show that this problem has a surprising connection with the geometry of a set of points in the Cartesian plane suitably derived from the input integers. We then build upon some classical results in computational geometry to introduce the first “learned” scheme for implementing a compressed rank/select dictionary. We prove theoretical bounds on its time and space performance both in the worst case and in the case of input distributions with finite mean and variance. We corroborate these theoretical results with a large set of experiments over datasets originating from a variety of sources and applications (Web, DNA sequencing, information retrieval and natural language processing), and we show that a carefully engineered version of our approach provides new interesting space-time trade-offs with respect to several well-established implementations of Elias-Fano, RRR-vector, and random-access vectors of Elias γ/δ-coded gaps.

A "Learned" Approach to Quicken and Compress Rank/Select Dictionaries

Antonio Boffa;Paolo Ferragina;Giorgio Vinciguerra

2021-01-01

Abstract

We address the well-known problem of designing, implementing and experimenting compressed data structures for supporting rank and select queries over a dictionary of integers. This problem has been studied far and wide since the end of the ‘80s with tons of important theoretical and practical results. Following a recent line of research on the so-called learned data structures, we first show that this problem has a surprising connection with the geometry of a set of points in the Cartesian plane suitably derived from the input integers. We then build upon some classical results in computational geometry to introduce the first “learned” scheme for implementing a compressed rank/select dictionary. We prove theoretical bounds on its time and space performance both in the worst case and in the case of input distributions with finite mean and variance. We corroborate these theoretical results with a large set of experiments over datasets originating from a variety of sources and applications (Web, DNA sequencing, information retrieval and natural language processing), and we show that a carefully engineered version of our approach provides new interesting space-time trade-offs with respect to several well-established implementations of Elias-Fano, RRR-vector, and random-access vectors of Elias γ/δ-coded gaps.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2021
			
	Codice ISBN
	
				978-1-61197-647-2
			
	Appare nelle tipologie:
	
				4.1 Contributo Atti Congressi/Articoli in extenso

File in questo prodotto:

File	Dimensione	Formato
1.9781611976472.4.pdf non disponibili Licenza: Copyright dell'editore Dimensione 2.5 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.5 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11382/566793

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

16

social impact