Design and implementation of a NoSQL data access interface
KeywordsDocument-oriented database ; MongoDB ; Spatial data ; Spatio-temporal data ; Spatial data ; Data access interface ; NoSQL stores
In recent years, the development of positioning technologies and the prevalence of GPS-equipped devices have generated vast amount of data with location and time information. Uber, one of the most known transportation network companies, records about 15 million trips every day over 600 cities worldwide. A Uber car drives a passenger from a departure to a destination location. During a trip, the embedded GPS of the car reports its geo-location in time, and the sequence of these locations forms a trajectory. Foursquare, a location technology company that provides an application for personalized recommendations of places on mobile devices, reports 8 million check-ins on a daily basis from its users. Many challenges arise with the rapid expansion of spatial and spatio-temporal data in volume and velocity, but the main target that remains is their efficient management and access. The amount of such data exceeds the storage and processing capabilities of a single machine, thus distributed database systems are adopted. Namely, NoSQL databases provide a promising solution for handling massive data, offering high performance, availability and scalability. Many NoSQL databases do not support directly spatial or spatio-temporal indexing, but several studies propose techniques and methods for supporting this type of data. Targeting to provide a unified way to access big data stored in NoSQL databases, we present an API that makes the procedure of data accessing simple, hiding implementation details. The API can stitch together with analytics tasks, as it offers many primitives and operators for expressing data access operations. In addition to the offered functionality, the API is extended to spatial and spatio-temporal data access. Moreover, we conduct extensive experiments on MongoDB database with real and synthetic mobility dataset, so as to study its performance in terms of efficiency and scalability for spatial and spatio-temporal data.