walkboutr
package will process GPS and accelerometry
data and create two different outputs:
- Full dataset: This dataframe contains all of the original data (latitude, longitude, activity counts) as well as the epoch start time. This time will match the times associated with the accelerometry data, and the GPS data have been matched up to the closest accelerometry epochs. The time variable returned, thus, reflects that of the accelerometry data. Note: GPS data are assigned to an epoch start time by rounding down the time associated with the GPS datapoint to the nearest epoch start time. For example, if epochs in the accelerometry data are 30 seconds, the time associated with a GPS data point will be rounded down to the nearest 30-second increment.
- Summarized dataset: This dataframe does not contain any of the original GPS/accelerometry data, and is thus completely de-identified and shareable. The output contains one row for each bout (walking or otherwise) as well as information on the median speed for that bout, whether there was a complete day worth of data for the bout, the start time of the bout, the duration in minutes, and the bout category. More details on bout category can be found below.
First we will generate some sample data:
gps_data <- generate_walking_in_seattle_gps_data()
accelerometry_counts <- make_full_day_bout_without_metadata()
Now that we have sample data, we can look at how
walkboutr
generates bouts:
walk_bouts <- identify_walk_bouts_in_gps_and_accelerometry_data(gps_data, accelerometry_counts)
summary_walk_bouts <- summarize_walk_bouts(walk_bouts)
The bouts identified look like this:
bout | bout_category | activity_counts | time | non_wearing | complete_day | latitude | longitude | speed |
---|---|---|---|---|---|---|---|---|
1 | walk_bout | 500 | 2012-04-07 00:03:00 | FALSE | TRUE | 47.64950 | 122.3754 | 1.1009735 |
1 | walk_bout | 500 | 2012-04-07 00:05:00 | FALSE | TRUE | 47.69231 | 122.4182 | 2.7901428 |
1 | walk_bout | 500 | 2012-04-07 00:07:00 | FALSE | TRUE | 47.75330 | 122.4792 | 0.9801357 |
1 | walk_bout | 500 | 2012-04-07 00:05:30 | FALSE | TRUE | 47.70546 | 122.4314 | 2.7249735 |
1 | walk_bout | 500 | 2012-04-07 00:06:00 | FALSE | TRUE | 47.71810 | 122.4440 | 4.0867381 |
1 | walk_bout | 500 | 2012-04-07 00:06:30 | FALSE | TRUE | 47.73806 | 122.4640 | 3.0513150 |
We can now use the second function to generate our summarized dataset, which is de-identified and shareable:
bout | median_speed | complete_day | bout_start | duration | bout_category |
---|---|---|---|---|---|
1 | 2.736466 | TRUE | 2012-04-07 00:02:30 | 5.0 | walk_bout |
2 | 2.555720 | TRUE | 2012-04-07 00:09:30 | 4304.5 | walk_bout |
-
Walk bout a
walk_bout
is defined based on the scientific literature as: Assuming a greedy algorithm and consideration of inactive time as consecutive, a walk bout is any contiguous period of time where the active epochs have accelerometry counts above the minimum threshold of 500 CPE (to allow for capture of light physical activity such as slow walking) and the time period:- Begins with an active epoch preceded by a non-walkbout
- Ends with an active epoch followed by at least 4 consecutive 30-second epochs of inactivity
- Contains at least 10 cumulative 30-second epochs of activity
- Is not a dwell bout
- Bout median speed based on GPS data falls between 2 and 6 kilometers
per hour (our reference walking speeds)
Accordingly, the following non-walk-bouts are defined as:
-
Non-walk bout due to slow pace a
non_walk_slow
bout is a bout where the median speed is too slow to be considered walking. -
Non-walk bout due to fast pace a
non_walk_fast
bout is a bout where the median speed is too fast to be considered walking. -
Non-walk bout due to high CPE a
non_walk_too_vigorous
bout is a bout where the average CPE is too high to be considered walking (ex. running or biking). -
Dwell bout a
dwell_bout
is a bout where the radius of GPS points is below our threshold for considering someone to have stayed in one place. -
Non-walk bout due to incomplete GPS coverage a
non_walk_incomplete_gps
bout is a bout where the GPS coverage is too low to be considered complete.
In order to better visualize our bouts, we can also plot the accelerometry counts and GPS radius.
accelerometry_counts <- make_smallest_bout_without_metadata()
gps_data <- generate_walking_in_seattle_gps_data()
generate_bout_plot(accelerometry_counts, gps_data, 1)