Title: | Piecewise Linear Segmentation by Dynamic Programming |
---|---|
Description: | Piecewise linear segmentation of ordered data by a dynamic programming algorithm. The algorithm was developed for time series data, e.g. growth curves, and for genome-wide read-count data from next generation sequencing, but is broadly applicable. Generic implementations of dynamic programming routines allow to scan for optimal segmentation parameters and test custom segmentation criteria ("scoring functions"). |
Authors: | Rainer Machne [aut, cre] , Peter F. Stadler [aut] |
Maintainer: | Rainer Machne <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.3 |
Built: | 2024-11-17 05:56:27 UTC |
Source: | https://gitlab.com/raim/dpseg |
dpseg
results or a table of segment borders.addLm
takes a segment table (with start/end columns) or a
result object from code dpseg
, calls base R function
lm
for each segment, and adds slope, intercept, r2 and
variance of residuals to the segment table. This data is required
for plot and predict method, eg. when dpseg
was called with
a pre-calculated scoring matrix, or alternative scoring functions
or recursion.
addLm(dpseg, x, y)
addLm(dpseg, x, y)
dpseg |
result object (class "dpseg") returned by function
|
x |
original x-data used |
y |
original y-data used |
Returns the input dpseg
object or segment table, but
with original xy
data and fit
results from a
linear regression with base R (lm(y~x)
) added to the
results and linear regression coefficient and goodness of fit
meaurs in the main segments
table.
## 1: run dpseg with store.matrix=TRUE to allow re-rung segs <- dpseg(x=oddata$Time, y=log(oddata$A3), store.matrix=TRUE) ## 2: run dpseg with score function matrix input segr <- dpseg(y=segs$SCR, P=0.0001, verb=1) ## NOTE: only data indices i and j are provided in results print(segr) ## 3: add original data and linear regression for segments ## NOTE: now also plot and predict methods work segr <- addLm(segr, x=oddata$Time, y=log(oddata$A3)) print(segr)
## 1: run dpseg with store.matrix=TRUE to allow re-rung segs <- dpseg(x=oddata$Time, y=log(oddata$A3), store.matrix=TRUE) ## 2: run dpseg with score function matrix input segr <- dpseg(y=segs$SCR, P=0.0001, verb=1) ## NOTE: only data indices i and j are provided in results print(segr) ## 3: add original data and linear regression for segments ## NOTE: now also plot and predict methods work segr <- addLm(segr, x=oddata$Time, y=log(oddata$A3)) print(segr)
dpseg
segment break-pointsBacktracing segment borders from the imax
vector of a
dpseg
recursion. This function is implemented more
efficiently in Rcpp; the R code is kept for documentation,
benchmarking and development.
backtrace_r(imax, jumps = 0)
backtrace_r(imax, jumps = 0)
imax |
integer vector of segment borders as returned by
|
jumps |
allwo discontinuous jumps: move 1 index position back,
only for |
an integer vector of segment ends
dpseg
splits a curve (x,y data) into linear segments by a
straight forward dynamic programming recursion:
where score is a measure of the
goodnes of the fit of a linear regression (equiv. to
lm(y~x)
) between data points . The default scoring
function is simply the negative variance of residuals of the linear
regression (see arguments
type
and scoref
). P
is a break-point penality that implicitly regulates the number of
segments (higher P: longer segments), and jumps==1
allows
for disjoint segments. The arguments minl
and maxl
specify minimal () and maximal (
)
segment lengths, which allows to significantly decrease memory
usage when expected segment lengths are known.
dpseg( x, y, maxl, jumps = FALSE, P = 0, minl = 3, S0 = 1, type = "var", scoref, verb = 1, move, store.values = TRUE, store.matrix = FALSE, add.lm = FALSE, recursion, backtrace, ... )
dpseg( x, y, maxl, jumps = FALSE, P = 0, minl = 3, S0 = 1, type = "var", scoref, verb = 1, move, store.values = TRUE, store.matrix = FALSE, add.lm = FALSE, recursion, backtrace, ... )
x |
x-values, generated as |
y |
y-values, or a pre-calculated scoring function matrix
|
maxl |
maximal segment length, |
jumps |
allow for jumps between segments, if |
P |
break-point penalty, increase to get longer segments with lower scores (eg. higher residual variance) |
minl |
minimal segment length, |
S0 |
initialization of |
type |
type of scoring function: available are "var" for
"variance of residuals", "cor" for Pearson correlation, or "r2"
for r-squared; see the package |
scoref |
alternative scoring function |
verb |
print progress messages |
move |
logical indicating whether move is required in
backtracing, required for the alternative recursion |
store.values |
store scoring values (linear regression results) |
store.matrix |
store the fitscore matrix |
add.lm |
add a linear fit using R base |
recursion |
internal recursion function to be used for
segmentation; used for debugging, benchmarking and development,
and required for putative novel scoring functions |
backtrace |
internal function to be used for back-tracing;
used for debugging, benchmarking and development, and may be
required to test novel scoring functions |
... |
further arguments to |
See the vignette("dpseg")
for the theory and details on the
choice of scoring functions and selection of the penalty parameter
P.
Returns a list object of class dpseg
(with
print.dpseg
plot.dpseg
and predict.dpseg
methods). The main result of the algorithm is a table
(data.frame
) of predicted segments in list object
segments
. The original data, run parameters and
(optionally) additional data calculated and used by the
algorithm are also returned.
main result table: a data.frame
that lists
the start and end x-values of the segments (columns 'x1' and
'x1'), the start and end indices (i,j) in the data vectors
(columns 'start' and 'end'), and the linear regression
coefficients and goodness-of-fit measures for the segments in
columns: 'intercept', 'slope', 'r2' (r-squared), and 'var'
(variance of the residuals). If dpseg
was called with a
pre-calculated scoring matrix, the table only contains start
and end indices i,j. If option add.lm=TRUE
or the result
object was sent through function addLm
the table
additionally contains results from R's lm
, indicated by
an ".lm" suffix.
results of the recursion, ie. in above
equation.
vector , storing the
that yielded
, ie., the sole input for the backtracing
function.
linear regression coefficients and measures for the
segment ending at and starting at
. Only present if
store.valus=TRUE
.
scoring function matrix
where positions j are the columns and i the rows; a banded
matrix with non-NA values between
and
. Note, that this matrix can be re-used in
subsequent calls as
dpseg(y=previous$SCR)
which runs
much faster and allows to efficiently scan for alternative
parameters. Only present if store.matrix=TRUE
.
result objects from lm
. Only present if
add.lm=TRUE
.
result of the call to the backtracing function: ends of the segments.
original x/y data (xy.coords).
index of NA/Inf values that were removed before running the alorithm.
used parameters P
, jumps
,
maxl
and minl
.
The package strictly depends only on RcppEigen
.
All other dependencies are usually present in a
basic installation (stats
, graphics
, grDevices
).
Rainer Machne [email protected], Peter F. Stadler [email protected]
## calculate linear segments in semi-log bacterial growth data ## NOTE: library loads bacterial growth curve data as data.frame oddata segs <- dpseg(x=oddata$Time, y=log(oddata$A3), minl=5, P=0.0001, verb=1) ## inspect resulting segments print(segs) ## plot results (also see the movie method) plot(segs) ## predict method plot(predict(segs), type="l")
## calculate linear segments in semi-log bacterial growth data ## NOTE: library loads bacterial growth curve data as data.frame oddata segs <- dpseg(x=oddata$Time, y=log(oddata$A3), minl=5, P=0.0001, verb=1) ## inspect resulting segments print(segs) ## plot results (also see the movie method) plot(segs) ## predict method plot(predict(segs), type="l")
dpseg
implementationSee dpseg
for a current version of this algorithm.
Note: this was a first test implementation of the linear
piecewise segmentation by a dynamic programming approach.
This implementation is very slow. A much more efficient
version, dpseg
, calculates the variance of residuals of a
linear regression incrementally while looping through the recursion, and
is implemented in Rcpp
.
See there for details on the algorithm.
This version is kept alive, since it is a more general implementation,
allowing to test different regression and scoring functions by
command-line arguments.
dpseg_old( x, y, minl, maxl = length(x), P = 0, EPS, store.matrix = FALSE, fitscoref = fitscore, fitf = linregf, scoref = varscore, verb = 0 )
dpseg_old( x, y, minl, maxl = length(x), P = 0, EPS, store.matrix = FALSE, fitscoref = fitscore, fitf = linregf, scoref = varscore, verb = 0 )
x |
x-values |
y |
y-values |
minl |
minimal segment length |
maxl |
maximal segment length |
P |
jump penalty, increase to get fewer segments # @inheritParams score |
EPS |
a pre-calculated fitscore matrix, will be generated if missing |
store.matrix |
store the fitscore matrix |
fitscoref |
the heavy-load loop that fills the fitscore matrix
using |
fitf |
fit function, used in the scoring function
|
scoref |
function to calculate a score from the passed fit function |
verb |
print progress messages |
The recursion calculates , where the fitscore is the variance of the
residuals of a linear regression (
lm(y~x
)) between
to
,
P
is a jump penality that
implicitly regulates the number of segments, minl
and
maxl
are minimal and maximal lengths of segments. Uses
RcppEigen:fastLm
for linear
regression.
Returns a list of result structures very similar to the
list of class "dpseg" returned by function dpseg
,
except for the name of the scoring function matrix, here:
EPS
. See ?dpseg
for detailed information on these
structures.
## NOTE: not run because it's too slow for R CMD check --as-cran ## calculate linear segments in semi-log bacterial growth data ## NOTE: library loads bacterial growth curve data as data.frame oddata Sj <- dpseg_old(x=oddata$Time, y=log(oddata$A3), minl=5, P=0.0001, verb=1) ## inspect resulting segments print(Sj) ## plot results plot(Sj, delog=TRUE, log="y") ## NOTE: predict method & movie function do not work for dpseg_old
## NOTE: not run because it's too slow for R CMD check --as-cran ## calculate linear segments in semi-log bacterial growth data ## NOTE: library loads bacterial growth curve data as data.frame oddata Sj <- dpseg_old(x=oddata$Time, y=log(oddata$A3), minl=5, P=0.0001, verb=1) ## inspect resulting segments print(Sj) ## plot results plot(Sj, delog=TRUE, log="y") ## NOTE: predict method & movie function do not work for dpseg_old
The break-point penalty P in a dpseg
recursion,
should be in the range of expected values of the scoring
function. To find a good initial estimate for P when using the
default scoring fuction (see dpseg
), the data is
smoothed by smooth.spline
and the variance of residuals
reported.
estimateP(x, y, plot = FALSE, ...)
estimateP(x, y, plot = FALSE, ...)
x |
x-values |
y |
y-values |
plot |
plot the |
... |
parameters for |
Returns a double, variance of residuals of a spline fit
(var(smooth.spline(x,y, ...)$y -y)
)
x <- oddata$Time y <- log(oddata$A5) p <- estimateP(x=x, y=y, plot=TRUE) plot(dpseg(x=x, y=y, jumps=TRUE, P=round(p,3)))
x <- oddata$Time y <- log(oddata$A5) p <- estimateP(x=x, y=y, plot=TRUE) plot(dpseg(x=x, y=y, jumps=TRUE, P=round(p,3)))
dpseg
segmentation recursion as a movie.Generates a movie of the calculation steps while
looping through the recursion
. Plots are sent to the
active plot device or, if
path
is specified, to a video file
<path>/<file.name>.<format>
via a system call to Image
Magick's convert
.
Saving to a file likely only works on Linux systems with Image
Magick installed and convert
available in the $PATH
environment variable. format
are formats available for
convert
, eg. format="gif"
or format="mpeg"
.
See the vignette("dpseg")
for details on the plotted data.
movie( dpseg, fix.ylim = TRUE, frames, delay = 0.1, repeat.last = 5, ylab = "scoring function", ylab2 = "y", xlab = "x", path, file.name = "dpseg_movie", format = "gif", res = 200, ... )
movie( dpseg, fix.ylim = TRUE, frames, delay = 0.1, repeat.last = 5, ylab = "scoring function", ylab2 = "y", xlab = "x", path, file.name = "dpseg_movie", format = "gif", res = 200, ... )
dpseg |
|
fix.ylim |
fix the y-axis of the |
frames |
x range to show as movie frames |
delay |
delay between frames in seconds, between x11 plot
updates or as argument |
repeat.last |
repeat list frame this many times |
ylab |
left y-axis label, for the scoring funtion |
ylab2 |
right y-axis label, for the original data |
xlab |
x-axis label |
path |
path where both temporary jpeg files and the final movie file will be generated. If not specified the indidividual frames will be plotted to the active plot device. |
file.name |
name of the generated video file <path>/<file.name>.<format> |
format |
format of the video, all outputs that image magick's convert can generate, e.g. "mpg" or "gif" |
res |
resolution of the generated movie (pixels per inch) |
... |
arguments passed to default |
## NOTE: requires that dpseg is run with store.matrix=TRUE segs <- dpseg(x=oddata$Time, y=log(oddata$A3), minl=5, P=0.0001, store.matrix=TRUE) ## View the algorithm in action: movie(segs, delay=0) ## NOTE: if Image Magick's convert is installed you can set the path ## option to save the movie as <path>/<file.name>.<format>, where format ## can be "gif", "mpeg" or else, depending on the Image Magick installation.
## NOTE: requires that dpseg is run with store.matrix=TRUE segs <- dpseg(x=oddata$Time, y=log(oddata$A3), minl=5, P=0.0001, store.matrix=TRUE) ## View the algorithm in action: movie(segs, delay=0) ## NOTE: if Image Magick's convert is installed you can set the path ## option to save the movie as <path>/<file.name>.<format>, where format ## can be "gif", "mpeg" or else, depending on the Image Magick installation.
Optical density (OD) data from a 96-well microtiter plate experiment, growing Escherichia coli cells in M9 medium in a BMG Optima platereader.
oddata
oddata
A data frame with the measurement time in column 1 and
bacterial growth data (or blanks) in
2:ncol(oddata)
. Column names correspond to the well on
the microtiter plate.
Tom Rohr, Anna Behle, Rainer Machne, HHU Duesseldorf, 2018
dpseg
segmentation model.Plot method for a dpseg
segmentation model.
## S3 method for class 'dpseg' plot(x, delog = FALSE, col, main, res = 10, vlines = TRUE, ...)
## S3 method for class 'dpseg' plot(x, delog = FALSE, col, main, res = 10, vlines = TRUE, ...)
x |
result object returned by function |
delog |
plot exp(y) |
col |
optional color vector for segments |
main |
plot title, dpseg parameters will be plotted if missing |
res |
x-resolution factor for the predicted model line, the new
coordinates |
vlines |
plot vertical lines at the breakpoints. |
... |
arguments passed to default |
Silently returns the x$segments
table , with color values
added if they were missing in the input.
Predicted values based on a data segmentation model from
dpseg
.
## S3 method for class 'dpseg' predict(object, xout, ...)
## S3 method for class 'dpseg' predict(object, xout, ...)
object |
result object returned by function
|
xout |
new x-values at which to predict |
... |
not used |
Returns predicted linear segments as x,y coordinates
(grDevices::xy.coords
) at xout
.
x <- oddata$Time y <- log(oddata$A5) segs <- dpseg(x=x, y=y, P=0.0001) ## predict method plot(x=x, y=y, pch=19, cex=0.5) lines(predict(segs), col=2, lwd=2)
x <- oddata$Time y <- log(oddata$A5) segs <- dpseg(x=x, y=y, P=0.0001) ## predict method plot(x=x, y=y, pch=19, cex=0.5) lines(predict(segs), col=2, lwd=2)
dpseg
.Prints the main result table x$segments
, segment coordinates
and indices, and parameters from the recursion. See dpseg
for details.
## S3 method for class 'dpseg' print(x, n = 6, ...)
## S3 method for class 'dpseg' print(x, n = 6, ...)
x |
result object returned by function |
n |
print maximally n segments (argument
to |
... |
further arguments to |
P
valuesRuns the dpseg
recursion for different values of the
penalty parameter P
and returns a matrix with the used
P
values, the resulting number of segments and (optionally)
the median of segment variance of residuals.
scanP(x, y, P, var = TRUE, use.matrix = TRUE, plot = TRUE, verb = 1, ...)
scanP(x, y, P, var = TRUE, use.matrix = TRUE, plot = TRUE, verb = 1, ...)
x |
x-values |
y |
y-values |
P |
vector of penalties P to scan |
var |
add the median of the variances of residuals of all
segments to output (save time by |
use.matrix |
use the stored scoring function matrix for more
efficient scans; set this to |
plot |
plot results |
verb |
print progress messages |
... |
parameters for |
Returns a matrix with the penalties P in the first column, the number of segments in the second column and the median of variances in the third column.
x <- oddata$Time y <- log(oddata$A5) par(mai=c(par("mai")[1:3], par("mai")[2])) # to show right axis sp <- scanP(x=x, y=y, P=seq(-.01,.1,length.out=50), plot=TRUE)
x <- oddata$Time y <- log(oddata$A5) par(mai=c(par("mai")[1:3], par("mai")[2])) # to show right axis sp <- scanP(x=x, y=y, P=seq(-.01,.1,length.out=50), plot=TRUE)
Constructs a segment table from segment ends (imax
) returned
by dpseg
backtracing functions
backtrace_r
and backtrace_c
. Correct segment
break-points require to know whether segment recursion was run with
the jumps
option of dpseg
. In joint segments
(jumps=FALSE
) segment borders are part of both left and
right segments.
sgtable(ends, starts, jumps = TRUE)
sgtable(ends, starts, jumps = TRUE)
ends |
integer vector of segment ends |
starts |
integer vector of segment starts |
jumps |
same parameter as passed to recursion function, allowing for discontinous jumps (TRUE) or enforcing joint segments (FALSE) |
a table with segment start
and end
columns