Title: | Distance Matrix Utilities |
---|---|
Description: | Functions to re-arrange, extract, and work with distances. |
Authors: | Kyle Bittinger [aut, cre] |
Maintainer: | Kyle Bittinger <[email protected]> |
License: | GPL-3 |
Version: | 0.4.0.9000 |
Built: | 2025-03-06 04:33:29 UTC |
Source: | https://github.com/kylebittinger/usedist |
Compute the distance between group centroids
dist_between_centroids(d, idx1, idx2, squared = FALSE)
dist_between_centroids(d, idx1, idx2, squared = FALSE)
d |
A distance matrix object of class |
idx1 |
A vector of items in group 1. |
idx2 |
A vector of items in group 2. |
squared |
If |
If you have a distance matrix, and the objects are partitioned into groups, you might like to know the distance between the group centroids. The centroid of each group is simply the center of mass for the group.
It is possible to infer the distance between group centroids directly from
the distances between items in each group. The adonis
test in the
ecology package vegan
takes advantage of this approach to carry out
an ANOVA-like test on distances.
The approach rests on the assumption that the objects occupy some high-dimensional Euclidean space. However, we do not have to actually create the space to find the distance between centroids. Based on the assumption that such a space exists, we can use an algebraic formula to perform the computation.
The formulas for this were presented by Apostol and Mnatsakanian in 2003, though we need to re-arrange equation 28 in their paper to get the value we want:
where is the number of samples in group 1,
is the
sum of squared distances between items in group 1, and
is
the sum of squared distances between items in group 1 and those in group 2.
Sometimes, the distance between centroids is not a real number, because it
is not possible to create a space where this distance exists. Mathematically,
we get a negative number underneath the square root in the equation above.
If this happens, the function returns NaN
. If you'd like to have
access to this value, you can set squared = TRUE
to return the
squared distance between centroids. In this case, you will never get
NaN
, but you might receive negative numbers in your result.
The distance between group centroids (see details).
Apostol, T.M. and Mnatsakanian, M.A. Sums of squares of distances in m-space. Math. Assoc. Am. Monthly 110, 516 (2003).
dist
object.Retrieve distances from a dist
object.
dist_get(d, idx1, idx2)
dist_get(d, idx1, idx2)
d |
A distance matrix object of class |
idx1 , idx2
|
Indices specifying the distances to extract. |
A vector of distances.
m4 <- matrix(1:16, nrow=4, dimnames=list(LETTERS[1:4])) dm4 <- dist(m4) dist_get(dm4, "A", "C") dist_get(dm4, "A", c("A", "B", "C", "D")) dist_get(dm4, c("A", "B", "C"), c("B", "D", "B"))
m4 <- matrix(1:16, nrow=4, dimnames=list(LETTERS[1:4])) dm4 <- dist(m4) dist_get(dm4, "A", "C") dist_get(dm4, "A", c("A", "B", "C", "D")) dist_get(dm4, c("A", "B", "C"), c("B", "D", "B"))
Create a data frame of distances between groups of items.
dist_groups(d, g)
dist_groups(d, g)
d |
A distance matrix object of class |
g |
A factor representing the groups of objects in |
A data frame with 6 columns:
The items being compared.
The groups to which the items belong.
A convenient label for plotting or comparison.
The distance between Item1 and Item2.
m4 <- matrix(1:16, nrow=4, dimnames=list(LETTERS[1:4])) dm4 <- dist(m4) g4 <- rep(c("Control", "Treatment"), each=2) dist_groups(dm4, g4)
m4 <- matrix(1:16, nrow=4, dimnames=list(LETTERS[1:4])) dm4 <- dist(m4) g4 <- rep(c("Control", "Treatment"), each=2) dist_groups(dm4, g4)
Make a distance matrix using a custom distance function
dist_make(x, distance_fcn, ...)
dist_make(x, distance_fcn, ...)
x |
A matrix of observations, one per row |
distance_fcn |
A function used to compute the distance between two
rows of the data matrix. The two rows will be passed as the first and
second arguments to |
... |
Additional arguments passed to |
We do not set the call
or method
attributes of the
dist
object.
A dist
object containing the distances between rows of the
data matrix.
x <- matrix(sin(1:30), nrow=5) rownames(x) <- LETTERS[1:5] manhattan_distance <- function (v1, v2) sum(abs(v1 - v2)) dist_make(x, manhattan_distance)
x <- matrix(sin(1:30), nrow=5) rownames(x) <- LETTERS[1:5] manhattan_distance <- function (v1, v2) sum(abs(v1 - v2)) dist_make(x, manhattan_distance)
Make a new distance matrix of centroid distances between multiple groups
dist_multi_centroids(d, g, squared = FALSE)
dist_multi_centroids(d, g, squared = FALSE)
d |
A distance matrix object of class |
g |
A factor representing the groups of items in |
squared |
If |
A distance matrix of distances between the group centroids.
dist
object.Set the names/labels of a dist
object.
dist_setNames(d, nm)
dist_setNames(d, nm)
d |
A distance matrix object of class |
nm |
New labels for the rows/columns. |
A distance matrix with new row/column labels.
m4 <- matrix(1:16, nrow=4, dimnames=list(LETTERS[1:4])) dm4 <- dist(m4) dist_setNames(dm4, LETTERS[9:12])
m4 <- matrix(1:16, nrow=4, dimnames=list(LETTERS[1:4])) dm4 <- dist(m4) dist_setNames(dm4, LETTERS[9:12])
dist
object.Extract a subset of values from a distance matrix. This function also works to re-arrange the rows of a distance matrix, if they are provided in the desired order.
dist_subset(d, idx)
dist_subset(d, idx)
d |
A distance matrix object of class |
idx |
Indices specifying the subset of distances to extract. |
A distance matrix.
m4 <- matrix(1:16, nrow=4, dimnames=list(LETTERS[1:4])) dm4 <- dist(m4) dist_subset(dm4, c("A", "B", "C")) dist_subset(dm4, c("D", "C", "B", "A"))
m4 <- matrix(1:16, nrow=4, dimnames=list(LETTERS[1:4])) dm4 <- dist(m4) dist_subset(dm4, c("A", "B", "C")) dist_subset(dm4, c("D", "C", "B", "A"))
Compute distances from each item to group centroids
dist_to_centroids(d, g, squared = FALSE)
dist_to_centroids(d, g, squared = FALSE)
d |
A distance matrix object of class |
g |
A factor representing the groups of items in |
squared |
If |
This function computes the distance from each item to the centroid positions
of groups defined in the argument g
. This is accomplished without
determining the centroid positions directly; see the documentation for
dist_between_centroids
for details on this procedure.
If the distance can't be represented in a Euclidean space, the
CentroidDistance
is set to NaN
. See the documentation for
dist_between_centroids
for further details.
A data frame with distances to the group centroids:
A character vector of item labels from the dist object, or an integer vector of item locations if labels are not present.
The group for which the centroid distance is given. The column type
should match that of the argument g (the unique
function is used
to generate this column).
Inferred distance from the item to the centroid position of the indicated group.
Convert a data frame in long format to a matrix
pivot_to_matrix(data, rows_from, cols_from, values_from, fill = 0) pivot_to_numeric_matrix(data, obs_col, feature_col, value_col)
pivot_to_matrix(data, rows_from, cols_from, values_from, fill = 0) pivot_to_numeric_matrix(data, obs_col, feature_col, value_col)
data |
A data frame in long format. |
rows_from |
The column indicating the row of the matrix. |
cols_from |
The column indicating the column of the matrix. |
values_from |
The column indicating the value to be placed inside the matrix. |
fill |
The value to use for missing combinations of rows and columns. |
obs_col , feature_col , value_col
|
The same as |
The parameters rows_from
, cols_from
, and values_from
should be provided as bare column names.
This function requires the packages tidyr
, rlang
, and
tibble
to be installed. If they are not installed, the function will
generate an error, with a message to install the appropriate packages.
pivot_to_numeric_matrix()
: Specialized version for numeric values.
Deprecated; use pivot_to_matrix
instead.
longdata <- data.frame( sample_id = paste0("Sample", c(1, 1, 1, 2, 2, 3, 3)), feature_id = paste0("Feature", c(1, 2, 3, 1, 2, 2, 3)), counts = c(132, 41, 7, 56, 11, 929, 83)) pivot_to_matrix(longdata, sample_id, feature_id, counts)
longdata <- data.frame( sample_id = paste0("Sample", c(1, 1, 1, 2, 2, 3, 3)), feature_id = paste0("Feature", c(1, 2, 3, 1, 2, 2, 3)), counts = c(132, 41, 7, 56, 11, 929, 83)) pivot_to_matrix(longdata, sample_id, feature_id, counts)