Lesson 8. Integrating activity data into the database
In the previous lessons, you have exclusively worked with GPS position data. We showed how to organise these data in databases and how to link them to environmental data. In this lesson, we introduce an example of data recorded by another type of sensor: acceleration data, which can be measured by many tags where they are associated with the GPS sensors and are widely used to interpret the behaviour of tagged animals. In this exercise, you will learn how to integrate acceleration data into the database while in the theoretical part the main data management challenges for this kind of massive data sets are discussed.
Topic 1. Extend the database: data on sensors and animals
In the previous lesson you learned how to correlate GPS positions to other spatio-temporal information such as NDVI values and DEMs. However, many kinds of bio-logging sensors are available to record a large set of information related to animals. In fact, we are quickly moving from animals monitored by one single sensor, usually a GPS receiver, to animals monitored by multiple, integrated sensors that register spatial (i.e. GPS positions) and non-spatial measurements such as acceleration, temperature, or GSM signal quality. In recent years, commercial solutions have emerged to deploy a camera on animals, or even internal sensors in the animals' body to register heartbeat and body temperature. These sensors are usually integrated on a unique physical support (the collar or tag). Data from all these different sensors can be related to the spatial position of the animal and one to each other on the basis of the acquisition time of the recorded information, thus giving a complete picture of the animal at a certain time. This integrated set of information can be used to fully decipher the animals' behaviour in space and time. The opportunity to answer new biological questions through the information derived from these multi-sensor monitoring platforms implies a number of further challenges in terms of data management. To be able to explore the multifaceted aspects of animals' behaviour, researchers must deal with even bigger and more diverse sets of data that require a more complex database data model and data acquisition procedures.
A complete discussion of methods to integrate data from all the available bio-logging sensors is outside the scope of this book. We will use acceleration data as an example to illustrate how you can include other sensor data into the database. One of the most-used non-spatial sensors in animal ecology is the acceleration sensor (accelerometer), often called an 'activity' sensor, which measures the acceleration of the body where the sensor is fixed. In wildlife telemetry studies, the activity data measured by these accelerometers, in combination with the spatial position of the animal, is widely used for a range of purposes:
- To detect time of death (being the original function of activity sensors in wildlife GPS devices).
- In life-strategy investigations, to find general species-specific annual patterns and seasonal levels of activity parameters such as mean activity, day-night patterns, and the number and duration of activity or resting phases.
- To distinguish behaviours by identification of behaviour-specific animal movement patterns using triaxial, or 3D, accelerometry. Past studies have shown that triaxial accelerometry data collected from sensors on animals can be used to detect very subtle differences in the patterns of movement such as step counts, the distinction between different gaits (e.g., jump, gallop, trot, pace), or lameness and handicaps.
- To automatically detect specific behaviours like calving, resting, fast locomotion or hunting. In combination with the knowledge of the GPS position of the animal, it then will be possible to create a so-called 'functional habitat use map' that says not only which places are used by the animal, but also what the animal is doing at those places or habitats. Since the activity data typically have a higher temporal resolution than the GPS positions, they also give more detailed information about the behaviour during the time between two GPS positions (e.g., resting behaviour, fast or slow locomotion) and a significantly finer movement pattern of the animal can be calculated than by using only GPS positions.
- To calculate energy expenditure and metabolic projections using 3D accelerometry and calibration for different conditions.
- To non-invasively detect stress conditions on free-running wild animals through chronobiological time-series analysis or to quantify anthropogenic disturbances by controlled disturbance trials.
Acceleration sensors measure acceleration by using mass inertia, sensing how much a mass presses on something when a force acts on it. In general, piezoelectric accelerometers are used in wildlife telemetry in which a crystal is attached to a mass, so when the accelerometer moves, the mass squeezes the crystal and generates a tiny electric voltage. Although this measurement procedure is essentially identical in all accelerometers that are used on animals, the acceleration sensors of various devices differ widely due to internal data processing methods, resolution, and sensitivity. In addition, sensors can measure in 1, 2, or 3 axes, with triaxial accelerometers providing three output signals—X, Y and Z—each for one of the three perpendicular axes.
Hence, the data obtained from accelerometers of different devices differ crucially from each other (e.g., 1D or 3D, different time intervals for measurements, recording absolute or mean values per time unit) and researchers must think carefully about what kind of measuring system, what measuring interval, and what analysis should be used to properly address the study question. For instance, some specific analysis tasks (e.g., distinguishing between different behaviours) require high-resolution triaxial acceleration measurements that not all devices are capable of. The most important selection criteria between the different 'activity' sensors from different brands are the number of the measuring axes, the options for recording and storage interval settings, the type of internal data processing, and the capabilities of data storage or data transmission. This means for the practical management of a database that activity data vary greatly in their structure depending on the used device or its sensor settings: They may have been measured at one, two or three axes, or by an all-round sensor, resulting in one, two, or three values at measurement. Furthermore, there are activity data recorded and stored at the millisecond scale without any data processing (e.g. e-Obs GPS devices) and there are activity measurements for which the original data were processed within the collar and the data output is in minutes (e.g. Vectronic Aerospace or Lotek devices). Regardless, no matter what activity sensors were used, you will always get the following output: for each measurement time (which could be milliseconds, seconds or minutes) there are one to three activity values (depending on the number of axes) that vary within their measurement range.
The general structure of activity sensors and activity sensor data is similar to that of data from GPS sensors. The sensor is associated with (deployed on) an animal for a defined time range. In most of the cases, GPS and activity sensors are attached to the same support (typically, a collar). In your data model, you have to clearly identify collars and sensors as separate (but related) objects. Different sensors attached to the same collar might have different activation periods, e.g. one sensor can stop working while the other(s) continues to record information. In our case, we do not include the objects 'collars' in the database, in order to simplify the data model and because they have no additional interesting information associated with them. This is just an example and other approaches are possible.
The size of activity data sets can be orders of magnitude greater than GPS data sets. This might imply performance issues in terms of processing time and storage, which can suggest a different data management approach. For example, you might decide to keep a single table for activity data, joining together raw data and derived information (e.g. the identifier of the animal). You can also use raw data just in the import stage, and then delete them from the database, using the plain text file downloaded from the sensors as backup. The best choice depends on the size of the database, the desired performance, the specific goals, and the operational environment. In any case, as activity data are generally acquired periodically by radio (monthly or weekly) or just once per sensor through a cable when the sensor is physically recovered after being removed from (or falling off) the animal, you do not necessarily need to set up automatic, real-time procedures for data import (e.g. association with animal, creation of an acquisition timestamp from the time and date received from the collar, quality check). Finally, data often come in different formats from different activity sensors; thus a specific data process might be necessary for each type of activity sensor.
For operational databases, keep in mind that when the size of a single table is very big, for example bigger than the RAM of your computer, it might be convenient to use partitioned table functionalities that split (behind the scenes) what is logically one large table into smaller physical pieces called partitions. Another issue related to large tables is the use of external keys and indexes. These can result in a significant slowdown of the time necessary to import data. Again, you have to evaluate the best solution according to the size of your database, the frequency of updates and routine analysis, and the desired of your information system. In this exercise, we will keep it simple and use the same approach as for GPS data. Now, you have to extend the database structure to accommodate this new information. The tables needed are:
- a table for information about activity sensors;
- a table for information about the deployment of the activity sensors on animals;
- a table for the raw data coming from activity sensors;
- a table for activity data associated with animals.
In the test data set (\tracking_db\data\sensors_data), you have a .csv file (ACTIVITY01508.csv) with the data collected by a Vectronic Aerospace GmbH activity sensor (ACTIVITY01508). It contains more than 100,000 records. You can explore the file content with a text editor. As you can see, you have eight attributes: the code of the activity sensor (created in the data acquisition step), the UTC and LMT time and date, the activity along the X and Y axes, and the temperature. Axis X measures acceleration in forward/backward motions as well as pitch angle by gravitational acceleration; axis Y measures sideward as well as rotary motion using gravitational acceleration. Sensors are queried four times per second simultaneously on both axes. All measurements of each of the X and Y axis are averaged over the user-selected sampling interval (here over 5 minutes) and given a value within a relative range between 0 and 255 characterizing the mean X- and Y-activity of each 5-minute interval. Both sensors were physically associated with the same collar of the GPS sensor GPS01508. When you define the link between the activity sensor and the animal (deployment time range), this relationship will be explicitly defined by the fact that both GPS and activity sensors are activated on the same animal at the same time.
First of all, we create a new table to accommodate information about activity sensors:
CREATE TABLE main.activity_sensors( activity_sensors_id integer, vendor character varying, activity_sensors_code character varying, model character varying, insert_timestamp timestamp with time zone DEFAULT now(), update_timestamp timestamp with time zone DEFAULT now(), CONSTRAINT activity_sensors_pkey PRIMARY KEY (activity_sensors_id), CONSTRAINT activity_sensors_code_unique UNIQUE (activity_sensors_code) );
COMMENT ON TABLE main.activity_sensors IS 'Catalogue of activity sensors.';
CREATE TRIGGER update_timestamp BEFORE UPDATE ON main.activity_sensors FOR EACH ROW EXECUTE PROCEDURE tools.timestamp_last_update();
Now we can populate it. As we have a single activity sensor, we can directly insert the record:
INSERT INTO main.activity_sensors (activity_sensors_id, vendor, activity_sensors_code, model) VALUES (1, 'Vectronic', 'ACTIVITY01508', 'Basic model');
You must now define a table to store the deployment time range of the activity sensor on an animal:
CREATE TABLE main.activity_sensors_animals( activity_sensors_animals_id serial, animals_id integer NOT NULL, activity_sensors_id integer NOT NULL, start_time timestamp with time zone NOT NULL, end_time timestamp with time zone, notes character varying, insert_timestamp timestamp with time zone DEFAULT now(), update_timestamp timestamp with time zone DEFAULT now(), CONSTRAINT activity_sensors_animals_pkey PRIMARY KEY (activity_sensors_animals_id ), CONSTRAINT activity_sensors_animals_animals_id_fkey FOREIGN KEY (animals_id) REFERENCES main.animals (animals_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE CASCADE, CONSTRAINT activity_sensors_animals_activity_sensors_id_fkey FOREIGN KEY (activity_sensors_id) REFERENCES main.activity_sensors (activity_sensors_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE CASCADE, CONSTRAINT time_interval_check CHECK (end_time > start_time) );
COMMENT ON TABLE main.activity_sensors_animals IS 'Table that stores information of deployments of activity sensors on animals.';
CREATE TRIGGER update_timestamp BEFORE UPDATE ON main.activity_sensors_animals FOR EACH ROW EXECUTE PROCEDURE tools.timestamp_last_update();
As said previously, the association between activity data and animals will be performed by the operators; thus no automatic procedure for the synchronization of the content of this table to the activity data table is needed.
You can now populate the table (with a single record):
INSERT INTO main.activity_sensors_animals(animals_id, activity_sensors_id, start_time, end_time, notes) VALUES (3,1,'2005-10-23 20:00:53 +0','2006-10-28 13:00:00 +0','Death of animal. Sensor recovered.');
Next you create the table to host the raw data coming from the activity sensor, replicating the structure of the .csv file. You add a field to store the complete acquisition time, joining time, and date coming from the sensor. This process relies on the function tools.acquisition_time_update() already created for GPS data:
CREATE TABLE main.activity_data( activity_data_id serial NOT NULL, activity_sensors_code character varying, utc_date date, utc_time time without time zone, lmt_date date, lmt_time time without time zone, activity_x integer, activity_y integer, temp double precision, insert_timestamp timestamp with time zone DEFAULT now(), acquisition_time timestamp with time zone, CONSTRAINT activity_data_pkey PRIMARY KEY (activity_data_id ), CONSTRAINT activity_data_sensors_fkey FOREIGN KEY (activity_sensors_code) REFERENCES main.activity_sensors (activity_sensors_code) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION );
COMMENT ON TABLE main.activity_data IS 'Table that stores raw data as they come from the activity sensors (plus the ID of the sensor).';
CREATE INDEX activity_acquisition_time_index ON main.activity_data USING btree (acquisition_time );
CREATE INDEX activity_sensors_code_index ON main.activity_data USING btree (activity_sensors_code);
CREATE TRIGGER update_acquisition_time BEFORE INSERT ON main.activity_data FOR EACH ROW EXECUTE PROCEDURE tools.acquisition_time_update();
Finally, you have to create a table to store activity data associated with animals:
CREATE TABLE main.activity_data_animals( activity_data_animals_id serial, activity_sensors_id integer, animals_id integer, acquisition_time timestamp with time zone, activity_x integer, activity_y integer, temp double precision, insert_timestamp timestamp with time zone DEFAULT now(), CONSTRAINT activity_data_animals_pkey PRIMARY KEY (activity_data_animals_id), CONSTRAINT activity_data_animals_animals_fkey FOREIGN KEY (animals_id) REFERENCES main.animals (animals_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT activity_data_animals_sensors_fkey FOREIGN KEY (activity_sensors_id) REFERENCES main.activity_sensors (activity_sensors_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION );
COMMENT ON TABLE main.activity_data_animals IS 'Table that stores activity data associated with animals.';
CREATE INDEX activity_animals_acquisition_time_index ON main.activity_data_animals USING btree (acquisition_time );
CREATE INDEX activity_animals_id_index ON main.activity_data_animals USING btree (animals_id);
You can see a schematic representation of the relations between these new tables in figure below.
!Figure: Tables related to activity data management](figures/schema_db_activity.png)
You are now ready to import the raw data:
COPY main.activity_data( activity_sensors_code, utc_date, utc_time, lmt_date, lmt_time, activity_x, activity_y, temp) FROM 'C:\tracking_db\data\sensors_data\ACTIVITY01508.csv' WITH CSV HEADER DELIMITER ';';
The last step is the association of activity data with the animal in the table main.activity_data_animals. The SQL code is similar to that used for GPS data:
INSERT INTO main.activity_data_animals ( animals_id, activity_sensors_id, acquisition_time, activity_x, activity_y, temp) SELECT activity_sensors_animals.animals_id, activity_sensors_animals.activity_sensors_id, activity_data.acquisition_time, activity_data.activity_x, activity_data.activity_y, activity_data.temp FROM main.activity_sensors_animals, main.activity_data, main.activity_sensors WHERE activity_data.activity_sensors_code = activity_sensors.activity_sensors_code AND activity_sensors.activity_sensors_id = activity_sensors_animals.activity_sensors_id AND activity_data.acquisition_time >= activity_sensors_animals.start_time AND activity_data.acquisition_time <= activity_sensors_animals.end_time;
- How many activity data do I have for each animal?
- Is there any clear difference between the average activity value during the day and during the night?