Δειγματοληψία χρονοσημασμένων, ακολουθιακών, σύνθετων τύπων δεδομένων
Subject
Οικονομετρικά υποδείγματα ; Time-series analysis ; Γεωγραφικά Συστήματα Πληροφοριών -- Στατιστικές μέθοδοιAbstract
The aim of this thesis is the design and adaptation of sampling techniques of complex data types, whose common feature is that they are time-stamped and sequential data. Firstly, a proper collection and recording of these time-stamped data through various methods of sampling is needed. Then, data will be processed and produce some results and conclusions.
The collection of huge amount of data can be done by various methods of sampling. More specifically, the method of sampling is the obtaining of a portion of data from a broader set of data and the categorization of them into two sub-groups, based on probability sampling and non-probability sampling.
In the second section, we will present the various methods of probability sampling, including the Simple random Sampling, the Systematic Sampling, the Stratified Sampling, the Cluster Sampling and the Multistage Sampling, as well as the various methods of sampling without probability, which are the Convenience Sampling or Random Sampling and the Ratio Sampling or Percentage Sampling. In probability sampling the observations of the sample are chosen independently and with equal chances, while in sampling without probability the selection of the individual observations which form the sample is made in a fixed and predetermined (systematic) way.
Next, we will analyze the terms of Timeseries, Trajectories and Webclicks, terms that constitute typical examples of sequential data.. In addition, we will refer to methods of data mining, which will then be used to evaluate our results.
There will be a reference to the timeseries and a presentation of some representative examples, in order to make the term of sequential data more understandable. In particular, a Timeseries is a sequence of data points, commonly measured in successive time points separated by equal intervals of time. The observations that represent a Timeseries are obtained at certain time points or periods of time, which equidistant from one another, and are collected through a sampling method.
Onwards, there will be a reference to the Trajectories and a presentation of some representative examples, in order to make the term of the recorded tracks of a moving object more understandable (e.g. daily routes followed by commercial trucks in the center of Athens).
The second section concludes with the analysis of the Webclicks, which constitute the recording of the number of clicks made at a specific site by the various users. It is a method of sampling with final aim the conduct of study and the drawing of conclusions, in order to benefit the owners of the various websites.
In the third section, we will present relevant studies based on the terms of sampling Trajectories and Timeseries. There will also be an application of the sampling methods in already collected data, in order to draw some conclusions and get in contact with the methods.
The selected articles which will be included in our project in a summary form are: «Segmentation and Sampling of Moving Object Trajectories Based on Representativeness», «Trajectory Sampling for Direct Traffic Observation» and «Unsupervised Trajectory Sampling». These three articles represent relevant studies based on the terms of trajectory and sampling. Two types of data will be used, analyzed and go through sampling methods. The first type includes the coordinates of ships recorded during three continuous days, and the second type consists of closing share prices for the last four years. The data to be used are found online and have been treated so that they can be used.
Finally, in the last section we will record the overall results and conclusions drawn from the study of the two previous chapters.