walkboutr
package will process GPS and accelerometry
data and create two different outputs:
- Full dataset: This dataframe contains all of the original data (latitude, longitude, activity counts) as well as the epoch start time. This time will match the times associated with the accelerometry data, and the GPS data have been matched up to the closest accelerometry epochs. The time variable returned, thus, reflects that of the accelerometry data. Note: GPS data are assigned to an epoch start time by rounding down the time associated with the GPS datapoint to the nearest epoch start time. For example, if epochs in the accelerometry data are 30 seconds, the time associated with a GPS data point will be rounded down to the nearest 30-second increment.
- Summarized dataset: This dataframe does not contain any of the original GPS/accelerometry data, and is thus completely de-identified and shareable. The output contains one row for each bout (walking or otherwise) as well as information on the median speed for that bout, whether there was a complete day worth of data for the bout, the start time of the bout, the duration in minutes, and the bout category. More details on bout category can be found below.
First we will generate some sample data.
To do this, we will use the
generate_walking_in_seattle_gps_data()
function to create
GPS data. This function generates a data frame containing GPS data for a
walking activity in Seattle, WA on April 7th, 2012. It calls the
function generate_gps_data to create a series of GPS locations and
speeds. The resulting data frame has columns for time, latitude,
longitude, and speed.
We will generate acceleromatry data using the
make_smallest_bout_without_metadata()
function. This
function creates the smallest bout window without the metadata columns
by calling the function make_smallest_bout
– which
generates a dataset representing the smallest bout, consisting of a
sequence of inactive periods followed by the smallest active period –
and removing the columns “non_wearing”, “complete_day”, and “bout”.
Together, this will give us a small sample of GPS and accelerometry
data that we can use as inputs to walkboutr
.
Here we generate our sample data:
gps_data <- generate_walking_in_seattle_gps_data()
accelerometry_counts <- make_full_day_bout_without_metadata()
Now that we have sample data, we can look at how
walkboutr
generates bouts:
walk_bouts <- identify_walk_bouts_in_gps_and_accelerometry_data(gps_data, accelerometry_counts)
summary_walk_bouts <- summarize_walk_bouts(walk_bouts)
The bouts identified look like this:
bout | bout_category | activity_counts | time | non_wearing | complete_day | latitude | longitude | speed |
---|---|---|---|---|---|---|---|---|
1 | walk_bout | 500 | 2012-04-07 00:03:00 | FALSE | TRUE | 47.63629 | 122.3622 | 1.1009735 |
1 | walk_bout | 500 | 2012-04-07 00:05:00 | FALSE | TRUE | 47.67909 | 122.4050 | 2.7901428 |
1 | walk_bout | 500 | 2012-04-07 00:07:00 | FALSE | TRUE | 47.74009 | 122.4660 | 0.9801357 |
1 | walk_bout | 500 | 2012-04-07 00:05:30 | FALSE | TRUE | 47.69225 | 122.4182 | 2.7249735 |
1 | walk_bout | 500 | 2012-04-07 00:06:00 | FALSE | TRUE | 47.70489 | 122.4308 | 4.0867381 |
1 | walk_bout | 500 | 2012-04-07 00:06:30 | FALSE | TRUE | 47.72484 | 122.4508 | 3.0513150 |
We can now use the second function to generate our summarized dataset, which is de-identified and shareable:
bout | median_speed | complete_day | bout_start | duration | bout_category |
---|---|---|---|---|---|
1 | 2.736466 | TRUE | 2012-04-07 00:02:30 | 5.0 | walk_bout |
2 | 2.555720 | TRUE | 2012-04-07 00:09:30 | 4304.5 | walk_bout |
-
Walk bout a
walk_bout
is defined based on the scientific literature as: Assuming a greedy algorithm and consideration of inactive time as consecutive, a walk bout is any contiguous period of time where the active epochs have accelerometry counts above the minimum threshold of 500 CPE (to allow for capture of light physical activity such as slow walking) and the time period:- Begins with an active epoch preceded by a non-walkbout
- Ends with an active epoch followed by at least 4 consecutive 30-second epochs of inactivity
- Contains at least 10 cumulative 30-second epochs of activity
- Is not a dwell bout
- Bout median speed based on GPS data falls between 2 and 6 kilometers
per hour (our reference walking speeds)
Accordingly, the following non-walk-bouts are defined as:
-
Non-walk bout due to slow pace a
non_walk_slow
bout is a bout where the median speed is too slow to be considered walking. -
Non-walk bout due to fast pace a
non_walk_fast
bout is a bout where the median speed is too fast to be considered walking. -
Non-walk bout due to high CPE a
non_walk_too_vigorous
bout is a bout where the average CPE is too high to be considered walking (ex. running or biking). -
Dwell bout a
dwell_bout
is a bout where the radius of GPS points is below our threshold for considering someone to have stayed in one place. -
Non-walk bout due to incomplete GPS coverage a
non_walk_incomplete_gps
bout is a bout where the GPS coverage is too low to be considered complete.
In order to better visualize our bouts, we can also plot the accelerometry counts and GPS radius.
accelerometry_counts <- make_smallest_bout_without_metadata()
gps_data <- generate_walking_in_seattle_gps_data()
generate_bout_plot(accelerometry_counts, gps_data, 1)
In this plot, the black dots and lines represent the accelerometry counts, plotted over time. The red dotted line shows us our threshold for a bout to be considered a walk bout, which is 500 counts per epoch. Based on this, we can see that the non-dwell walk bout starts at the first point where the counts exceed 500, and ends at the last point before the counts drop below 500 again.
The top right corner shows a circle representing the GPS radius, which is calculated based on the GPS data. The blue part of this circle (the small dot in the middle) represents the area that would be considered a dwell bout based on our dwell bout radius threshold. We can see that the GPS radius is larger than the dwell bout radius, as indicated by the green circle (the area in which this bout took place) being larger than the blue circle. As such, this indicates that the person was moving through a larger space than a dwelling during this bout.
Accordingly, this plot shows us that this was a non-dwell walk bout, as the accelerometry counts exceed the threshold for a walk bout, and the GPS radius is larger than the dwell bout radius.