Title: | Perform Phylogenetic Path Analysis |
---|---|
Description: | A comprehensive and easy to use R implementation of confirmatory phylogenetic path analysis as described by Von Hardenberg and Gonzalez-Voyer (2012) <doi:10.1111/j.1558-5646.2012.01790.x>. |
Authors: | Wouter van der Bijl [aut, cre] |
Maintainer: | Wouter van der Bijl <[email protected]> |
License: | GPL-3 |
Version: | 1.3.0 |
Built: | 2024-11-21 05:07:11 UTC |
Source: | https://github.com/ax3man/phylopath |
Extract and average the best supported models from a phylogenetic path analysis.
average(phylopath, cut_off = 2, avg_method = "conditional", ...)
average(phylopath, cut_off = 2, avg_method = "conditional", ...)
phylopath |
An object of class |
cut_off |
The CICc cut-off used to select the best models. Use
|
avg_method |
Either |
... |
Arguments to pass to phylolm::phylolm and phylolm::phyloglm. Provide |
An object of class fitted_DAG
.
candidates <- define_model_set( A = NL ~ RS, B = RS ~ NL + BM, .common = c(LS ~ BM, DD ~ NL, NL ~ BM) ) p <- phylo_path(candidates, rhino, rhino_tree) summary(p) # Models A and B have similar support, so we may decide to take # their average. avg_model <- average(p) # Print the average model to see coefficients, se and ci: avg_model ## Not run: # Plot to show the weighted graph: plot(avg_model) # One can see that an averaged model is not necessarily a DAG itself. # This model actually has a path in two directions. # Note that coefficients that only occur in one of the models become much # smaller when we use full averaging: coef_plot(avg_model) coef_plot(average(p, method = 'full')) ## End(Not run)
candidates <- define_model_set( A = NL ~ RS, B = RS ~ NL + BM, .common = c(LS ~ BM, DD ~ NL, NL ~ BM) ) p <- phylo_path(candidates, rhino, rhino_tree) summary(p) # Models A and B have similar support, so we may decide to take # their average. avg_model <- average(p) # Print the average model to see coefficients, se and ci: avg_model ## Not run: # Plot to show the weighted graph: plot(avg_model) # One can see that an averaged model is not necessarily a DAG itself. # This model actually has a path in two directions. # Note that coefficients that only occur in one of the models become much # smaller when we use full averaging: coef_plot(avg_model) coef_plot(average(p, method = 'full')) ## End(Not run)
Perform model averaging on a list of DAGs.
average_DAGs( fitted_DAGs, weights = rep(1, length(coef)), avg_method = "conditional", ... )
average_DAGs( fitted_DAGs, weights = rep(1, length(coef)), avg_method = "conditional", ... )
fitted_DAGs |
A list of |
weights |
A vector of associated model weights. |
avg_method |
Either |
... |
Use of the ellipses is deprecated. |
An object of class fitted_DAG
, including standard errors and
confidence intervals.
# Normally, I would advocate the use of the phylo_path and average # functions, but this code shows how to average any set of models. Note # that not many checks are implemented, so you may want to be careful and # make sure the DAGs make sense and contain the same variables! candidates <- define_model_set( A = NL ~ BM, B = NL ~ LS, .common = c(LS ~ BM, DD ~ NL) ) fit_cand <- lapply(candidates, est_DAG, rhino, rhino_tree, model = 'lambda', method = 'logistic_MPLE') ave_cand <- average_DAGs(fit_cand) coef_plot(ave_cand)
# Normally, I would advocate the use of the phylo_path and average # functions, but this code shows how to average any set of models. Note # that not many checks are implemented, so you may want to be careful and # make sure the DAGs make sense and contain the same variables! candidates <- define_model_set( A = NL ~ BM, B = NL ~ LS, .common = c(LS ~ BM, DD ~ NL) ) fit_cand <- lapply(candidates, est_DAG, rhino, rhino_tree, model = 'lambda', method = 'logistic_MPLE') ave_cand <- average_DAGs(fit_cand) coef_plot(ave_cand)
Extract and estimate the best supported model from a phylogenetic path analysis.
best(phylopath, ...)
best(phylopath, ...)
phylopath |
An object of class |
... |
Arguments to pass to phylolm::phylolm and phylolm::phyloglm. Provide |
An object of class fitted_DAG
.
candidates <- define_model_set( A = NL ~ BM, B = NL ~ LS, .common = c(LS ~ BM, DD ~ NL) ) p <- phylo_path(candidates, rhino, rhino_tree) best_model <- best(p) # Print the best model to see coefficients, se and ci: best_model # Plot to show the weighted graph: plot(best_model)
candidates <- define_model_set( A = NL ~ BM, B = NL ~ LS, .common = c(LS ~ BM, DD ~ NL) ) p <- phylo_path(candidates, rhino, rhino_tree) best_model <- best(p) # Print the best model to see coefficients, se and ci: best_model # Plot to show the weighted graph: plot(best_model)
Extract and estimate an arbitrary model from a phylogenetic path analysis.
choice(phylopath, choice, ...)
choice(phylopath, choice, ...)
phylopath |
An object of class |
choice |
A character string of the name of the model to be chosen, or
the index in |
... |
Arguments to pass to phylolm::phylolm and phylolm::phyloglm. Provide |
An object of class fitted_DAG
.
candidates <- define_model_set( A = NL ~ BM, B = NL ~ LS, .common = c(LS ~ BM, DD ~ NL) ) p <- phylo_path(candidates, rhino, rhino_tree) my_model <- choice(p, "B") # Print the best model to see coefficients, se and ci: my_model # Plot to show the weighted graph: plot(my_model)
candidates <- define_model_set( A = NL ~ BM, B = NL ~ LS, .common = c(LS ~ BM, DD ~ NL) ) p <- phylo_path(candidates, rhino, rhino_tree) my_model <- choice(p, "B") # Print the best model to see coefficients, se and ci: my_model # Plot to show the weighted graph: plot(my_model)
A data set with binary traits, used in an analysis on the evolution of cooperative breeding by Dey et al 2017. Variable names are shortened for easy of use and consist of cooperative breeding (C), mating system (M), parental care (P), social grouping (G) and diet (D). All traits are coded as two level factors.
cichlids
cichlids
An object of class data.frame
with 69 rows and 5 columns.
Dey, C.J., O'Connor, C.M., Wilkinson, H., Shultz, S., Balshine, S. & Fitzpatrick, J.L. 2017. Direct benefits and evolutionary transitions to complex societies. Nat Ecol Evol 1: 137.
The phylogenetic tree of cichlid species that accompanies the cichlids
dataset. The phylogeny is based on five nuclear genes and three
mitochondrial genes.
cichlids_tree
cichlids_tree
An object of class phylo
of length 4.
Dey, C.J., O'Connor, C.M., Wilkinson, H., Shultz, S., Balshine, S. & Fitzpatrick, J.L. 2017. Direct benefits and evolutionary transitions to complex societies. Nat Ecol Evol 1: 137.
Plot path coefficients and their confidence intervals or standard errors.
coef_plot( fitted_DAG, error_bar = "ci", order_by = "default", from = NULL, to = NULL, reverse_order = FALSE )
coef_plot( fitted_DAG, error_bar = "ci", order_by = "default", from = NULL, to = NULL, reverse_order = FALSE )
fitted_DAG |
A fitted DAG, usually obtained by |
error_bar |
Whether to use confidence intervals ( |
order_by |
By |
from |
Only show path coefficients from these nodes. Supply as a character vector. |
to |
Only show path coefficients to these nodes. Supply as a character vector. |
reverse_order |
If |
A ggplot
object.
d <- DAG(LS ~ BM, NL ~ BM, DD ~ NL + LS) plot(d) d_fitted <- est_DAG(d, rhino, rhino_tree, 'lambda') plot(d_fitted) coef_plot(d_fitted, error_bar = "se") # to create a horizontal version, use this: coef_plot(d_fitted, error_bar = "se", reverse_order = TRUE) + ggplot2::coord_flip()
d <- DAG(LS ~ BM, NL ~ BM, DD ~ NL + LS) plot(d) d_fitted <- est_DAG(d, rhino, rhino_tree, 'lambda') plot(d_fitted) coef_plot(d_fitted, error_bar = "se") # to create a horizontal version, use this: coef_plot(d_fitted, error_bar = "se", reverse_order = TRUE) + ggplot2::coord_flip()
This function is a simple wrapper around the function from the ggm
package with the same name. The only differences are that the order
argument defaults to TRUE
and that it adds a DAG
class for
easy plotting. Typically, one would use define_model_set()
to
create models for use with the phylopath
package.
DAG(..., order = TRUE)
DAG(..., order = TRUE)
... |
a sequence of model formulae |
order |
logical, defaulting to |
Supply a formulas for the model as arguments. Formulas should be of the
form child ~ parent`` and describe each path in your model. Multiple children of a single parent can be combined into a single formula:
child ~ parent1 + parent2. Finally, an isolate (unconnected variable) can be included as being connected to itself:
isolate ~ isolate'.
An object of classes matrix
and DAG
# Use formula notation to create DAGs: plot(DAG(A~B, B~C)) # Use + to easily add multiple parents to a node: plot(DAG(A~B+C)) # Add a node as it's own parent to create an isolate: plot(DAG(A~B+C, D~D))
# Use formula notation to create DAGs: plot(DAG(A~B, B~C)) # Use + to easily add multiple parents to a node: plot(DAG(A~B+C)) # Add a node as it's own parent to create an isolate: plot(DAG(A~B+C, D~D))
This is a convenience function to quickly and clearly define a set of causal
models. Supply a list of formulas for each model, using either c()
. Formulas
should be of the form child ~ parent
and describe each path in your model.
Multiple children of a single parent can be combined into a single formula:
child ~ parent1 + parent2
.
define_model_set(..., .common = NULL)
define_model_set(..., .common = NULL)
... |
Named arguments, which each are a lists of formulas defining the paths of a causal model. |
.common |
A list of formulas that contain causal paths that are common to each model. |
This function uses ggm::DAG()
.
A list of models, each of class matrix
and DAG
.
(m <- define_model_set( A = c(a~b, b~c), B = c(b~a, c~b), .common = c(d~a))) plot_model_set(m)
(m <- define_model_set( A = c(a~b, b~c), B = c(b~a, c~b), .common = c(d~a))) plot_model_set(m)
Add standardized path coefficients to a DAG.
est_DAG(DAG, data, tree, model, method, boot = 0, ...)
est_DAG(DAG, data, tree, model, method, boot = 0, ...)
DAG |
A directed acyclic graph, typically created with |
data |
A |
tree |
A phylogenetic tree of class |
model |
The evolutionary model used for the regressions on continuous variables. See phylolm::phylolm for options and details. Defaults to Pagel's lambda model |
method |
The estimation method for the binary models. See phylolm::phyloglm for options and details. Defaults to logistic MPLE. |
boot |
The number of bootstrap replicates used to estimate confidence intervals. |
... |
Arguments passed on to
Arguments passed on to
|
An object of class fitted_DAG
.
d <- DAG(LS ~ BM, NL ~ BM, DD ~ NL + LS) plot(d) d_fitted <- est_DAG(d, rhino, rhino_tree, 'lambda') plot(d_fitted)
d <- DAG(LS ~ BM, NL ~ BM, DD ~ NL + LS) plot(d) d_fitted <- est_DAG(d, rhino, rhino_tree, 'lambda') plot(d_fitted)
Continuous variables are modeled using phylolm::phylolm, while binary traits are modeled using phylolm::phyloglm.
phylo_path( model_set, data, tree, model = "lambda", method = "logistic_MPLE", order = NULL, parallel = NULL, na.rm = TRUE, ... )
phylo_path( model_set, data, tree, model = "lambda", method = "logistic_MPLE", order = NULL, parallel = NULL, na.rm = TRUE, ... )
model_set |
A list of directed acyclic graphs. These are matrices,
typically created with |
data |
A |
tree |
A phylogenetic tree of class |
model |
The evolutionary model used for the regressions on continuous variables. See phylolm::phylolm for options and details. Defaults to Pagel's lambda model |
method |
The estimation method for the binary models. See phylolm::phyloglm for options and details. Defaults to logistic MPLE. |
order |
Causal order of the included variable, given as a character vector. This is used to determine which variable should be the dependent in the dsep regression equations. If left unspecified, the order will be automatically determined. If the combination of all included models is itself a DAG, then the ordering of that full model is used. Otherwise, the most common ordering between each pair of variables is used to create a general ordering. |
parallel |
Superseded From v1.2 |
na.rm |
Should rows that contain missing values be dropped from the data as necessary (with a message)? |
... |
Arguments passed on to
Arguments passed on to
|
Parallel processing: From v1.2, phylopath
uses the future
framework
for parallel processing. This is compatible with the parallel computation
within the underlying phylolm
, making it easy to enable parallel
processing of multiple models, and of bootstrap replicates. To enable,
simply set a parallel plan()
using the future
package. Typically, you'll
want to run future::plan("multisession", workers = n)
, where n
is the
number of cores. Now parallel processing is enabled. Return to sequential
processing using future::plan("sequential")
A phylopath object, with the following components:
for each model a table with separation statements and statistics.
the DAGs
the supplied data
the supplied tree
the employed model of evolution in phylolm
the employed method in phyloglm
any additional arguments given, these are passed on to downstream functions
any warnings generated by the models
#see vignette('intro_to_phylopath') for more details candidates <- define_model_set( A = NL ~ BM, B = NL ~ LS, .common = c(LS ~ BM, DD ~ NL) ) p <- phylo_path(candidates, rhino, rhino_tree) # Printing p gives some general information: p # And the summary gives statistics to compare the models: summary(p)
#see vignette('intro_to_phylopath') for more details candidates <- define_model_set( A = NL ~ BM, B = NL ~ LS, .common = c(LS ~ BM, DD ~ NL) ) p <- phylo_path(candidates, rhino, rhino_tree) # Printing p gives some general information: p # And the summary gives statistics to compare the models: summary(p)
Plot several causal hypothesis at once.
plot_model_set( model_set, labels = NULL, algorithm = "kk", manual_layout = NULL, text_size = 5, box_x = 12, box_y = 10, edge_width = 1, curvature = 0.05, rotation = 0, flip_x = FALSE, flip_y = FALSE, nrow = NULL, arrow = grid::arrow(type = "closed", 15, grid::unit(10, "points")) )
plot_model_set( model_set, labels = NULL, algorithm = "kk", manual_layout = NULL, text_size = 5, box_x = 12, box_y = 10, edge_width = 1, curvature = 0.05, rotation = 0, flip_x = FALSE, flip_y = FALSE, nrow = NULL, arrow = grid::arrow(type = "closed", 15, grid::unit(10, "points")) )
model_set |
A list of |
labels |
An optional set of labels to use for the nodes. This should be a named vector, of
the form |
algorithm |
A layout algorithm from |
manual_layout |
Alternatively, precisely define the layout yourself, by providing a
|
text_size |
Size of the node label text. |
box_x |
To avoid the arrows colliding with the nodes, specify the rectangular dimensions of an invisible box around each node. If you have long labels, you need to increase this. |
box_y |
To avoid the arrows colliding with the nodes, specify the rectangular dimensions of an invisible box around each node. If you have multi-line labels, you need to increase this. |
edge_width |
Width of the edges. |
curvature |
Curvature of the edges. A slight curvature can look pretty. |
rotation |
Supply the degrees you want to rotate the layout by. This is useful in order to put rotate your upstream nodes towards the top if needed. |
flip_x |
Whether to flip the node positions horizontally. |
flip_y |
Whether to flip the node positions vertically. |
nrow |
Number of rows to display the models on. |
arrow |
A The order of facets is taken from the ordering of the list, with the facet labels coming from the names of the list. If the list is unnamed, sequential lettering is used. |
A ggplot
object.
m <- list(one = DAG(a ~ b + c + d), two = DAG(a ~ b, b ~ c, d ~ d)) plot_model_set(m) plot_model_set(m, algorithm = "sugiyama")
m <- list(one = DAG(a ~ b + c + d), two = DAG(a ~ b, b ~ c, d ~ d)) plot_model_set(m) plot_model_set(m, algorithm = "sugiyama")
Plot a directed acyclic graph.
## S3 method for class 'DAG' plot( x, labels = NULL, algorithm = "sugiyama", manual_layout = NULL, text_size = 6, box_x = 12, box_y = 8, edge_width = 1.5, curvature = 0.02, rotation = 0, flip_x = FALSE, flip_y = FALSE, arrow = grid::arrow(type = "closed", 18, grid::unit(15, "points")), ... )
## S3 method for class 'DAG' plot( x, labels = NULL, algorithm = "sugiyama", manual_layout = NULL, text_size = 6, box_x = 12, box_y = 8, edge_width = 1.5, curvature = 0.02, rotation = 0, flip_x = FALSE, flip_y = FALSE, arrow = grid::arrow(type = "closed", 18, grid::unit(15, "points")), ... )
x |
A 'DAG“ object, usually created with the |
labels |
An optional set of labels to use for the nodes. This should be a named vector, of
the form |
algorithm |
A layout algorithm from igraph, see
|
manual_layout |
Alternatively, precisely define the layout yourself, by providing a
|
text_size |
Size of the node label text. |
box_x |
To avoid the arrows colliding with the nodes, specify the rectangular dimensions of an invisible box around each node. If you have long labels, you need to increase this. |
box_y |
To avoid the arrows colliding with the nodes, specify the rectangular dimensions of an invisible box around each node. If you have multi-line labels, you need to increase this. |
edge_width |
Width of the edges. |
curvature |
Curvature of the edges. A slight curvature can look pretty. |
rotation |
Supply the degrees you want to rotate the layout by. This is useful in order to put rotate your upstream nodes towards the top if needed. |
flip_x |
Whether to flip the node positions horizontally. |
flip_y |
Whether to flip the node positions vertically. |
arrow |
A The order of facets is taken from the ordering of the list, with the facet labels coming from the names of the list. If the list is unnamed, sequential lettering is used. |
... |
Not used. |
d <- DAG(a ~ b + c + d) plot(d) # Plot with manually defined positions: ml <- data.frame( name = c('a', 'b', 'c', 'd'), x = c(1, 1, 2, 2), y = c(1, 2, 1, 2) ) plot(d, manual_layout = ml)
d <- DAG(a ~ b + c + d) plot(d) # Plot with manually defined positions: ml <- data.frame( name = c('a', 'b', 'c', 'd'), x = c(1, 1, 2, 2), y = c(1, 2, 1, 2) ) plot(d, manual_layout = ml)
Plot a directed acyclic graph with path coefficients.
## S3 method for class 'fitted_DAG' plot( x, type = "width", labels = NULL, algorithm = "sugiyama", manual_layout = NULL, text_size = 6, box_x = 12, box_y = 8, edge_width = 1.25, curvature = 0.02, rotation = 0, flip_x = FALSE, flip_y = FALSE, arrow = grid::arrow(type = "closed", 18, grid::unit(15, "points")), colors = c("firebrick", "navy"), show.legend = TRUE, width_const = NULL, ... )
## S3 method for class 'fitted_DAG' plot( x, type = "width", labels = NULL, algorithm = "sugiyama", manual_layout = NULL, text_size = 6, box_x = 12, box_y = 8, edge_width = 1.25, curvature = 0.02, rotation = 0, flip_x = FALSE, flip_y = FALSE, arrow = grid::arrow(type = "closed", 18, grid::unit(15, "points")), colors = c("firebrick", "navy"), show.legend = TRUE, width_const = NULL, ... )
x |
An object of class |
type |
How to express the weight of the path. Either |
labels |
An optional set of labels to use for the nodes. This should be a named vector, of
the form |
algorithm |
A layout algorithm from |
manual_layout |
Alternatively, precisely define the layout yourself, by providing a
|
text_size |
Size of the node label text. |
box_x |
To avoid the arrows colliding with the nodes, specify the rectangular dimensions of an invisible box around each node. If you have long labels, you need to increase this. |
box_y |
To avoid the arrows colliding with the nodes, specify the rectangular dimensions of an invisible box around each node. If you have multi-line labels, you need to increase this. |
edge_width |
Width of the edges. |
curvature |
Curvature of the edges. A slight curvature can look pretty. |
rotation |
Supply the degrees you want to rotate the layout by. This is useful in order to put rotate your upstream nodes towards the top if needed. |
flip_x |
Whether to flip the node positions horizontally. |
flip_y |
Whether to flip the node positions vertically. |
arrow |
A The order of facets is taken from the ordering of the list, with the facet labels coming from the names of the list. If the list is unnamed, sequential lettering is used. |
colors |
The end points of the continuous color scale. Keep in mind that red and green are obvious colors to use, but are better to be avoided because of color blind users. |
show.legend |
Whether a legend for the color scale should be shown. |
width_const |
Deprecated. |
... |
Not used. |
d <- DAG(LS ~ BM, NL ~ BM, DD ~ NL + LS) d_fitted <- est_DAG(d, rhino, rhino_tree, 'lambda') plot(d_fitted)
d <- DAG(LS ~ BM, NL ~ BM, DD ~ NL + LS) d_fitted <- est_DAG(d, rhino, rhino_tree, 'lambda') plot(d_fitted)
A dataset with continuous variables affecting the conservation Status
of
mammalian species (the IUCN red list of threatened species).
red_list
red_list
An object of class data.frame
with 474 rows and 7 columns.
It includes the following variables: brain size (Br
), body size (B
),
gestation period (G
), litter size (L
), weening age (W
), population
density (P
) and vulnerability to extinction (Status
).
Gonzalez-Voyer A, Gonzalez-Suarez M, Vila C, Revilla E (2016) Larger brain size indirectly increases vulnerability to extinction in mammals. Evolution 70:1364-1375. doi: 10.1111/evo.12943
This is the accompanying phylogeny for the red_list data set. It is based on the updated mammalian supertree by Bininda-Emonds et al. 2007 & Fritz et al. 2009.
red_list_tree
red_list_tree
An object of class phylo
of length 4.
Gonzalez-Voyer, A. Gonzalez-Suarez M. Vila C. and Revilla E. 2016. Larger brain size indirectly increases vulnerability to extinction in mammals. Evolution 70:1364-1375. doi: 10.1111/evo.12943.
Bininda-Emonds, O. R. P., M. Cardillo, K. E. Jones, R. D. E. MacPhee, R. M. D. Beck, R. Grenyer, S. A. Price, R. A. Vos, J. L. Gittleman, and A. Purvis. 2007. The delayed rise of present-day mammals. Nature 446:507-512.
Fritz, S. A., O. R. P. Bininda-Emonds, and A. Purvis. 2009. Geographical variation in predictors of mammalian extinction risk: big is bad, but only in the tropics. Ecol. Lett. 12:538-549.
A simulated dataset, as used by Gonzalez-Voyer and Von Hardenberg as an example, containing variables on body mass (BM), litter size (LS), nose length (NL), dispersal distance (DD) and range size (RS).
rhino
rhino
An object of class data.frame
with 100 rows and 6 columns.
Gonzalez-Voyer A & von Hardenberg A. 2014. An Introduction to Phylogenetic Path Analysis. Chapter 8. In: Garamszegi LZ (ed.), Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology. pp. 201-229. Springer-Verlag Berlin Heidelberg. doi:10.1111/j.1558-5646.2012.01790.x
A phylogenetic tree for the 100 species of the rhino
dataset.
rhino_tree
rhino_tree
An object of class phylo
of length 4.
Gonzalez-Voyer A & von Hardenberg A. 2014. An Introduction to Phylogenetic Path Analysis. Chapter 8. In: Garamszegi LZ (ed.), Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology. pp. 201-229. Springer-Verlag Berlin Heidelberg. doi:10.1111/j.1558-5646.2012.01790.x
Use this function after running phylo_path()
to conveniently print any generated warnings
to the screen. You can either provide no arguments, which will only work if you run it directly
after the analysis, or you have to provide the phylopath
object manually.
show_warnings(phylopath = NULL)
show_warnings(phylopath = NULL)
phylopath |
A |