Lesson 9. Working with activity data: what the animal was doing there?


Once acceleration data are stored into the database, they are ready to be analyzed. Acceleration data can provide valuable information and can help to create the so called semantic trajectories, i.e. description of the movement that identify specific actions of animals while they interact with the surrounding environment. In this lesson you will see some example of exploratory analysis.

Topic 1. Relate activity and GPS data

Introduction

You can now join activity and GPS data. In our activity data set, activity data are recorded with a frequency of five minutes, thus much more often than GPS positions (with a frequency of four hours). Hence, you need to combine the two data sets in a way that makes sense for the specific scientific question to be asked.

Example

The most simple approach to relate activity and GPS data is to sub-sample the activity data to the same temporal resolution as the GPS positions. This can be done by a simple JOIN between the GPS data and the activity data, where you combine the data from the two sensors on the same animal at the same time. The result is that you associate the activity data record (what the animal is doing) to each GPS position (where the animal is doing it). You look for the activity data that is in an interval of 5 minutes (2 minutes and 30 seconds after and before) from the GPS position acquisition time. The option LIMIT 5 limits the result to the first 5 records. You can see that with large amounts of data, the processing time can increase considerably:

SELECT
  gps_data_animals.animals_id AS id,
  gps_data_animals.acquisition_time AS acquisition_time_gps,
  activity_data_animals.acquisition_time::time AS time_act,
  ST_X(gps_data_animals.geom)::numeric(7,5) AS gps_x,
  ST_Y(gps_data_animals.geom)::numeric(7,5) AS gps_y,
  activity_data_animals.activity_x AS act_x,
  activity_data_animals.activity_y AS act_y
FROM
  main.gps_data_animals
INNER JOIN
  main.activity_data_animals
ON
  gps_data_animals.animals_id = activity_data_animals.animals_id AND
  ((gps_data_animals.acquisition_time - activity_data_animals.acquisition_time) < interval '150 second') AND
  ((gps_data_animals.acquisition_time - activity_data_animals.acquisition_time) > interval '-150 second') AND
  gps_validity_code = 1
LIMIT 5

The result is

id acquisition_time_gps time_act gps_x gps_y act_x act_y
3 2005-10-24 00:00:55+00 00:00:00 11.05471 46.00606 17 40
3 2005-10-24 04:00:55+00 04:00:00 11.05290 46.00842 8 36
3 2005-10-24 16:02:57+00 16:05:00 11.06117 46.00654 10 39
3 2005-10-25 00:01:23+00 00:00:00 11.05217 46.00757 23 36
3 2005-10-25 04:00:53+00 04:00:00 11.05288 46.00767 27 38

In the next query, you retrieve the average and standard deviation value for activity in the different land cover types. In this way, you can see whether the data indicate an obvious effect of the environment on the activity of the animal:

SELECT
  label3 AS land_cover,
  avg(activity_data_animals.activity_x)::numeric(5,2) AS avg_x,
  stddev(activity_data_animals.activity_x)::numeric(5,2) AS stddev_x,
  avg(activity_data_animals.activity_y)::numeric(5,2) AS avg_y,
  stddev(activity_data_animals.activity_y)::numeric(5,2) AS stddev_y,
  count(label3) AS num
FROM
  main.gps_data_animals,
  main.activity_data_animals,
  env_data.corine_land_cover_legend
WHERE
  gps_data_animals.animals_id = activity_data_animals.animals_id AND
  ((gps_data_animals.acquisition_time - activity_data_animals.acquisition_time) < interval '150 second') AND
  ((gps_data_animals.acquisition_time - activity_data_animals.acquisition_time) > interval '-150 second') AND
  gps_validity_code = 1 AND
  gps_data_animals.corine_land_cover_code = corine_land_cover_legend.grid_code
GROUP BY
  label3;

The result reported is

land_cover avg_x stddev_x avg_y stddev_y num
Mixed forest 18.05 32.87 22.72 32.18 378
Broad-leaved forest 13.08 20.27 22.07 25.43 431
Sparsely vegetated areas 13.82 12.44 29.44 23.44 101
Coniferous forest 1.00 1.00 1
Pastures 29.81 37.71 36.74 35.38 772

As you can see, the highest activity rate is recorded in pasture lands, which might lead to interesting considerations on the behaviour of the animal.
In the last example, you retrieve the average activity value of the 6 activity records that are closest in time to each GPS position, which means 15 minutes after and before.

SELECT
  gps_data_animals.gps_data_animals_id AS gps_id,
  gps_data_animals.animals_id AS animal,
  gps_data_animals.acquisition_time,
  avg(activity_data_animals.activity_x)::numeric(5,2) AS avg_act_x,
  avg(activity_data_animals.activity_y)::numeric(5,2) AS avg_act_y
FROM
  main.gps_data_animals
INNER JOIN
  main.activity_data_animals
ON
  gps_data_animals.animals_id = activity_data_animals.animals_id AND
  ((gps_data_animals.acquisition_time - activity_data_animals.acquisition_time) < interval '15 minute') AND
  ((gps_data_animals.acquisition_time - activity_data_animals.acquisition_time) > interval '-15 minute') AND
  gps_validity_code = 1
GROUP BY
  gps_data_animals.gps_data_animals_id,
  gps_data_animals.animals_id,
  gps_data_animals.acquisition_time
ORDER BY
  gps_data_animals.acquisition_time
LIMIT 5;

The result is

gps_id animal acquisition_time avg_act_x avg_act_y
15275 3 2005-10-23 20:00:53+00 27.67 56.33
15276 3 2005-10-24 00:00:55+00 34.17 66.00
15277 3 2005-10-24 04:00:55+00 16.00 33.83
15280 3 2005-10-24 16:02:57+00 5.17 16.33
15281 3 2005-10-24 20:01:49+00 15.83 29.83

Exercise

  1. Retrieve the centroid of locations for animal 3 for each month together with the average monthly value of act_x and act_y

Topic 2. Identification of animal's activities from activity data

Introduction

The main goal of activity sensors id to provide data that can help to decipher what an animal was doing. Transforming activity data into animal's activity is a complex process and it is still a field of research. In this lesson we explore simple approaches.

Example

Assuming that the value '0' correspond to a very low activity (e.g. resting), you can calculate the percentage of records with that value per each hour of the day (regardless of the animal) during June. You can use the CASE SQL command to transform the activity data into 1 if the original value is 0 and into 0 if the original value was greater than 0. Then you can count the total number of records and make a sum to know the percentage of 'inactive' records.

SELECT 
  animals_id, 
  EXTRACT(HOUR FROM acquisition_time) AS hour_day, 
  sum(CASE WHEN activity_x = 0 THEN 1 ELSE 0 END)*100/count(*) AS perc_inactivity_x,
  sum(CASE WHEN activity_y = 0 THEN 1 ELSE 0 END)*100/count(*) AS perc_inactivity_y
FROM 
  main.activity_data_animals
WHERE 
  EXTRACT(MONTH FROM acquisition_time) = 6
GROUP BY 
  animals_id, 
  EXTRACT(HOUR FROM acquisition_time)
ORDER BY 
  EXTRACT(HOUR FROM acquisition_time);

The result is

animals_id hour_day perc_inactivity_x perc_inactivity_y
3 0 20 15
3 1 30 22
3 2 19 13
3 3 13 5
3 4 25 17
3 5 28 17
3 6 25 16
3 7 25 14
3 8 33 17
3 9 28 12
3 10 31 13
3 11 28 10
3 12 19 10
3 13 18 7
3 14 21 8
3 15 20 7
3 16 14 6
3 17 13 5
3 18 10 7
3 19 7 2
3 20 18 12
3 21 21 17
3 22 22 18
3 23 23 15

Exercise

  1. What is the percentage of inactive records (i.e. activity = 0) in December? Is any clear season pattern (compare with percentage of inactive records in June)?

Summary exercise of Lesson 9

  1. Is there any daily pattern in the average activity value (compare percentage of inactive records per hour of the day)?
  2. Associate GPS location with activity data that are inside a time interval of 2 hours after and before the GPS acquisition time and calculate the percentage of inactive activity record per GPS location. How many locations have a percentage of inactivity greater than 75%