Package 'lg'

Title:	Locally Gaussian Distributions: Estimation and Methods
Description:	An implementation of locally Gaussian distributions. It provides methods for implementing locally Gaussian multivariate density estimation, conditional density estimation, various independence tests for iid and time series data, a test for conditional independence and a test for financial contagion.
Authors:	Håkon Otneim [aut, cre]
Maintainer:	Håkon Otneim <[email protected]>
License:	GPL-3
Version:	0.4.1
Built:	2025-04-01 05:33:14 UTC
Source:	https://github.com/hotneim/lg

Help Index

Generate sample from a conditional density estimate
Bandwidth selection for local Gaussian correlation.
Cross-validation for bivariate distributions
Cross-validation for trivariate distributions
Cross-validation for univariate distributions
Plugin bandwidth selection for multivariate data
Plugin bandwidth selection for univariate data
Create simple bandwidth object
Check bandwidth vector
Check bw method
Check bandwidth vector
Check the data and grid
Check the arguments for the dmvnorm_wrapper function
Check estimation method
Check that an object has class "lg"
Test for conditional independence
Calculate the value of the test statistic for the conditional independence test
The locally Gaussian conditional density estimator
Test for financial contagion
Plot local correlation maps
The locally Gaussian density estimator (LGDE)
Bivariate density estimation
Marginal density estimation
Marginal estimates for multivariate data
Trivariate density estimation
Wrapper for dmvnorm
Wrapper for dmvnorm - single point
Auxiliary function for calculating the asymptotic standard deviations for the local Gaussian correlations
Independence tests
Function that calculates the test statistic in the independence tests.
Interpolate a univariate conditional density function
lg: A package for calculating the local Gaussian correlation in multivariate applications.
Create an lg object
Calculate the local conditional covariance between two variables
Auxiliary function for calculating the asymptotic standard deviations for the local Gaussian correlations
Evaluate the multivariate normal
Calculate the local Gaussian partial correlation
Bootstrap replication under the null hypothesis
Transform the marginals of a multivariate data set to standard normality based on the logspline density estimator (Kooperberg and Stone, 1991). See Otneim and Tjøstheim (2017) for details.
Auxiliary function for calculating the local score function u

Generate sample from a conditional density estimate

Description

Generate a sample from a locally Gaussian conditional density estimate using the accept-reject algorithm. If the transform_to_marginal_normality- component of the lg_object is TRUE, the replicates will be on the standard normal scale.

Usage

accept_reject(
  lg_object,
  condition,
  n_new,
  nodes,
  M = NULL,
  M_sim = 1500,
  M_corr = 1.5,
  n_corr = 1.2,
  return_just_M = FALSE,
  extend = 0.3
)
accept_reject(
  lg_object,
  condition,
  n_new,
  nodes,
  M = NULL,
  M_sim = 1500,
  M_corr = 1.5,
  n_corr = 1.2,
  return_just_M = FALSE,
  extend = 0.3
)

Arguments

`lg_object`	An object of type `lg`, as produced by the `lg_main`-function
`condition`	The value of the conditioning variables
`n_new`	The number of observations to generate
`nodes`	Either the number of equidistant nodes to generate, or a vector of nodes supplied by the user
`M`	The value for M in the accept-reject algorithm if already known
`M_sim`	The number of replicates to simulate in order to find a value for M
`M_corr`	Correction factor for M, to be on the safe side
`n_corr`	Correction factor for n_new, so that we mostly will generate enough observations in the first go
`return_just_M`	`TRUE` if we just want to find M, without actually generating any replications.
`extend`	How far to extend the grid beyond the extreme data points when interpolating, in share of the range

Bandwidth selection for local Gaussian correlation.

Description

Takes a matrix of data points and returns the bandwidths used for estimating the local Gaussian correlations.

Usage

bw_select(
  x,
  bw_method = "plugin",
  est_method = "1par",
  plugin_constant_marginal = 1.75,
  plugin_exponent_marginal = -1/5,
  plugin_constant_joint = 1.75,
  plugin_exponent_joint = -1/6,
  tol_marginal = 10^(-3),
  tol_joint = 10^(-3)
)
bw_select(
  x,
  bw_method = "plugin",
  est_method = "1par",
  plugin_constant_marginal = 1.75,
  plugin_exponent_marginal = -1/5,
  plugin_constant_joint = 1.75,
  plugin_exponent_joint = -1/6,
  tol_marginal = 10^(-3),
  tol_joint = 10^(-3)
)

Arguments

`x`	A matrix or data frame with data, one column per variable, one row per observation.
`bw_method`	The method used for bandwidth selection. Must be either `"cv"` (cross-validation, slow, but accurate) or `"plugin"` (fast, but crude).
`est_method`	The estimation method, must be either "1par", "5par" or "5par_marginals_fixed", see `lg_main`.
`plugin_constant_marginal`	The constant `c` in `cn^a` used for finding the plugin bandwidth for locally Gaussian marginal density estimates, which we need if estimation method is "5par_marginals_fixed".
`plugin_exponent_marginal`	The constant `a` in `cn^a` used for finding the plugin bandwidth for locally Gaussian marginal density estimates, which we need if estimation method is "5par_marginals_fixed".
`plugin_constant_joint`	The constant `c` in `cn^a` used for finding the plugin bandwidth for estimating the pairwise local Gaussian correlation between two variables.
`plugin_exponent_joint`	The constant `a` in `cn^a` used for finding the plugin bandwidth for estimating the pairwise local Gaussian correlation between two variables.
`tol_marginal`	The absolute tolerance in the optimization for finding the marginal bandwidths when using cross validation.
`tol_joint`	The absolute tolerance in the optimization for finding the joint bandwidths when using cross-validation.

Details

This is the main bandwidth selection function within the framework of locally Gaussian distributions as described in Otneim and Tjøstheim (2017). This function takes in a data set of arbitrary dimension, and calculates the bandwidths needed to find the pairwise local Gaussian correlations, and is mainly used by the main lg_main wrapper function.

Value

A list with three elements, marginal contains the bandwidths used for the marginal locally Gaussian estimation, marginal_convergence contains the convergence flags for the marginal bandwidths, as returned by the optim function, and joint contains the pairwise bandwidths and convergence flags.

References

Otneim, Håkon, and Dag Tjøstheim. "The locally gaussian density estimator for multivariate data." Statistics and Computing 27, no. 6 (2017): 1595-1616.

Examples

  x <- cbind(rnorm(100), rnorm(100), rnorm(100))
  bw <- bw_select(x)

x <- cbind(rnorm(100), rnorm(100), rnorm(100))
  bw <- bw_select(x)

Cross-validation for bivariate distributions

Description

Uses cross-validation to find the optimal bandwidth for a bivariate locally Gaussian fit

Usage

bw_select_cv_bivariate(
  x,
  tol = 10^(-3),
  est_method = "1par",
  bw_marginal = NULL
)
bw_select_cv_bivariate(
  x,
  tol = 10^(-3),
  est_method = "1par",
  bw_marginal = NULL
)

Arguments

`x`	The matrix of data points.
`tol`	The absolute tolerance in the optimization, used by the `optim`-function.
`est_method`	The estimation method for the bivariate fit. If estimation method is `5par_marginals_fixed`, the marginal bandwidths must be supplied as well through the argument `bw_marginal`. This is automatically handled by the `lg_main` wrapper function.
`bw_marginal`	The bandwidths for estimation of the marginals if method `5par_fixed_marginals` is used.

Details

This function provides an implementation for the Cross Validation algorithm for bandwidth selection described in Otneim & Tjøstheim (2017), Section 4. Let $\hat{f}_h(x)$ be the bivariate locally Gaussian density estimate obtained using the bandwidth $h$ , then this function returns the bandwidth that maximizes

$CV(h) = n^{-1} \sum_{i=1}^n \log \hat{f}_h^{(-i)}(x_i),$

where $\hat{f}_h^{(-i)}$ is the density estimate calculated without observation $x_i$ .

The recommended use of this function is through the lg_main wrapper function.

Value

The function returns a list with two elements: bw is the selected bandwidths, and convergence is the convergence flag returned by the optim-function.

References

Otneim, Håkon, and Dag Tjøstheim. "The locally gaussian density estimator for multivariate data." Statistics and Computing 27, no. 6 (2017): 1595-1616.

Examples

  ## Not run: 
    x <- cbind(rnorm(100), rnorm(100))
    bw <- bw_select_cv_univariate(x)
  
## End(Not run)

## Not run: 
    x <- cbind(rnorm(100), rnorm(100))
    bw <- bw_select_cv_univariate(x)
  
## End(Not run)

Cross-validation for trivariate distributions

Description

Uses cross-validation to find the optimal bandwidth for a trivariate locally Gaussian fit

Usage

bw_select_cv_trivariate(x, tol = 10^(-3))
bw_select_cv_trivariate(x, tol = 10^(-3))

Arguments

`x`	The matrix of data points.
`tol`	The absolute tolerance in the optimization, used by the `optim`-function.

Details

This function provides an implementation for the Cross Validation algorithm for bandwidth selection described in Otneim & Tjøstheim (2017), Section 4, but for trivariate distributions. Let $\hat{f}_h(x)$ be the trivariate locally Gaussian density estimate obtained using the bandwidth $h$ , then this function returns the bandwidth that maximizes

$CV(h) = n^{-1} \sum_{i=1}^n \log \hat{f}_h^{(-i)}(x_i),$

where $\hat{f}_h^{(-i)}$ is the density estimate calculated without observation $x_i$ .

The recommended use of this function is through the lg_main wrapper function.

Value

The function returns a list with two elements: bw is the selected bandwidths, and convergence is the convergence flag returned by the optim-function.

References

Otneim, Håkon, and Dag Tjøstheim. "The locally gaussian density estimator for multivariate data." Statistics and Computing 27, no. 6 (2017): 1595-1616.

Examples

  ## Not run: 
    x <- cbind(rnorm(100), rnorm(100), rnorm(100))
    bw <- bw_select_cv_trivariate(x)
  
## End(Not run)
## Not run: 
    x <- cbind(rnorm(100), rnorm(100), rnorm(100))
    bw <- bw_select_cv_trivariate(x)
  
## End(Not run)

Cross-validation for univariate distributions

Description

Uses cross-validation to find the optimal bandwidth for a univariate locally Gaussian fit

Usage

bw_select_cv_univariate(x, tol = 10^(-3))
bw_select_cv_univariate(x, tol = 10^(-3))

Arguments

`x`	The vector of data points.
`tol`	The absolute tolerance in the optimization, passed to the `optim`-function using the BFGS-method.

Details

This function provides the univariate version of the Cross Validation algorithm for bandwidth selection described in Otneim & Tjøstheim (2017), Section 4. Let $\hat{f}_h(x)$ be the univariate locally Gaussian density estimate obtained using the bandwidth $h$ , then this function returns the bandwidth that maximizes

$CV(h) = n^{-1} \sum_{i=1}^n \log \hat{f}_h^{(-i)}(x_i),$

where $\hat{f}_h^{(-i)}$ is the density estimate calculated without observation $x_i$ .

Value

The function returns a list with two elements: bw is the selected bandwidth, and convergence is the convergence flag returned by the optim-function.

References

Otneim, Håkon, and Dag Tjøstheim. "The locally gaussian density estimator for multivariate data." Statistics and Computing 27, no. 6 (2017): 1595-1616.

Examples

  x <- rnorm(100)
  bw <- bw_select_cv_univariate(x)

x <- rnorm(100)
  bw <- bw_select_cv_univariate(x)

Plugin bandwidth selection for multivariate data

Description

Returns a plugin bandwidth for multivariate data matrices for the estimation of local Gaussian correlations

Usage

bw_select_plugin_multivariate(x = NULL, n = nrow(x), c = 1.75, a = -1/6)
bw_select_plugin_multivariate(x = NULL, n = nrow(x), c = 1.75, a = -1/6)

Arguments

`x`	The data matrix.
`n`	The number of data points. Can provide only this if we do not want to supply the entire data vector.
`c`	A constant, se details.
`a`	A constant, se details.

Details

This function takes in a data matrix with n rows, and returns a the real number c*n^a, which is a quick and dirty way of selecting a bandwidth for locally Gaussian density estimation. The number c is by default set to 1.75, and c = -1/6 is the usual exponent, that stems from the asymptotic convergence rate of the density estimate. This function is usually called from the lg_main wrapper function.

Value

A number, the selected bandwidth.

Examples

  x <- cbind(rnorm(100), rnorm(100))
  bw <- bw_select_plugin_multivariate(x = x)
  bw <- bw_select_plugin_multivariate(n = 100)

x <- cbind(rnorm(100), rnorm(100))
  bw <- bw_select_plugin_multivariate(x = x)
  bw <- bw_select_plugin_multivariate(n = 100)

Plugin bandwidth selection for univariate data

Description

Returns a plugin bandwidth for data vectors for use with univariate locally Gaussian density estimation

Usage

bw_select_plugin_univariate(x = NULL, n = length(x), c = 1.75, a = -1/5)
bw_select_plugin_univariate(x = NULL, n = length(x), c = 1.75, a = -1/5)

Arguments

`x`	The data vector.
`n`	The number of data points. Can provide only this if we do not want to supply the entire data vector.
`c`	A constant, se details.
`a`	A constant, se details.

Details

This function takes in a data vector of length n, and returns a the real number c*n^a, which is a quick and dirty way of selecting a bandwidth for univariate locally Gaussian density estimation. The number c is by default set to 1.75, and c = -1/5 is the usual exponent that stems from the asymptotic convergence rate of the density estimate. Recommended use of this function is through the lg_main wrapper function.

Value

A number, the selected bandwidth.

Examples

  x <- rnorm(100)
  bw <- bw_select_plugin_univariate(x = x)
  bw <- bw_select_plugin_univariate(n = 100)

x <- rnorm(100)
  bw <- bw_select_plugin_univariate(x = x)
  bw <- bw_select_plugin_univariate(n = 100)

Create simple bandwidth object

Description

Create a simple bandwidths object for local Gaussian correlations

Usage

bw_simple(joint = 1, marg = NA, x = NULL, dim = NULL)
bw_simple(joint = 1, marg = NA, x = NULL, dim = NULL)

Arguments

`joint`	Joint bandwidth
`marg`	Marginal bandwidths
`x`	The data set
`dim`	The number of variables

Details

This function provides a quick way of producing a bandwidth object that may be used in the lg_main()-function. The user must specify a bandwidth joint that is used for all joint bandwidths, and the user may specify marg, a marginal bandwidth that will be used for all marginal bandwidths. This is needed if the subsequent analyses use est_method = "5par_marginals_fixed".

The function must know the dimension of the problem, which is achieved by either supplying the data set x or the number of variables dim.

Examples


  bw_object <- bw_simple(joint = 1, marg = 1, dim = 3)

bw_object <- bw_simple(joint = 1, marg = 1, dim = 3)

Check bandwidth vector

Description

Checks that the bandwidth vector supplied to the bivariate density function is a numeric vector of length 2.

Usage

check_bw_bivariate(bw)
check_bw_bivariate(bw)

Arguments

`bw`	The bandwidth vector to be checked

Check bw method

Description

Checks that the bandwidth method is one of the allowed values, currently "cv" or "plugin".

Usage

check_bw_method(bw_method)
check_bw_method(bw_method)

Arguments

bw_method

Check if equal to "cv" or "plugin"

Check bandwidth vector

Description

Checks that the bandwidth vector supplied to the bivariate density function is a numeric vector of length 3.

Usage

check_bw_trivariate(bw)
check_bw_trivariate(bw)

Arguments

`bw`	The bandwidth vector to be checked

Check the data and grid

Description

Checks that the data or grid provided is of the correct form. This function is an auxiliary function that can quickly check that a supplied data set or grid is a matrix or a data frame, and that it has the correct dimension, as defined by the dim_check parameter. The type argument is simply a character vector "data" or "grid" that is used for printing error messages.

Usage

check_data(x, dim_check = NA, type)
check_data(x, dim_check = NA, type)

Arguments

`x`	Data or grid
`dim_check`	How many columns do we expect?
`type`	Is it the "grid" or "data" for use in error messages.

Check the arguments for the `dmvnorm_wrapper` function

Description

Checks that the arguments provided to the dmvnorm_wrapper-function are numerical vectors, all having the same lengths.

Usage

check_dmvnorm_arguments(eval_points, mu_1, mu_2, sig_1, sig_2, rho)
check_dmvnorm_arguments(eval_points, mu_1, mu_2, sig_1, sig_2, rho)

Arguments

`eval_points`	A `kx2` matrix with evaluation points
`mu_1`	The first expectation vector
`mu_2`	The second expectation vector
`sig_1`	The first standard deviation vector
`sig_2`	The second standard deviation vector
`rho`	The correlation vector

Check estimation method

Description

Checks that the estimation method is one of the allowed values, currently "1par", "5par" and "5par_marginals_fixed".

Usage

check_est_method(est_method)
check_est_method(est_method)

Arguments

est_method

Check if equal to a valid value

Check that an object has class "lg"

Description

Checks that the provided object has class lg.

Usage

check_lg(check_object)
check_lg(check_object)

Arguments

check_object

The object to be checked

Test for conditional independence

Description

Perform a test for conditional independence between the first two variables in the data set, given the remaining variables.

Usage

ci_test(
  lg_object,
  h = function(x) x^2,
  S = function(y) rep(T, nrow(y)),
  n_rep = 500,
  nodes = 100,
  M = NULL,
  M_sim = 1500,
  M_corr = 1.5,
  n_corr = 1.2,
  extend = 0.3,
  return_time = TRUE
)
ci_test(
  lg_object,
  h = function(x) x^2,
  S = function(y) rep(T, nrow(y)),
  n_rep = 500,
  nodes = 100,
  M = NULL,
  M_sim = 1500,
  M_corr = 1.5,
  n_corr = 1.2,
  extend = 0.3,
  return_time = TRUE
)

Arguments

`lg_object`	An object of type `lg`, as produced by the `lg_main`-function
`h`	The `h`-function used in the calculation of the test statistic. The default value is `h(x) = x^2`.
`S`	The integration area in the test statistic. Logical function that takes grid points as argument.
`n_rep`	The number of replicated bootstrap samples
`nodes`	Either the number of equidistant nodes to generate, or a vector of nodes supplied by the user
`M`	The value for M in the accept-reject algorithm if already known
`M_sim`	The number of replicates to simulate in order to find a value for M
`M_corr`	Correction factor for M, to be on the safe side
`n_corr`	Correction factor for n_new, so that we mostly will generate enough observations in the first go
`extend`	How far to extend the grid beyond the extreme data points when interpolating, in share of the range
`return_time`	Measure how long the test takes to run, and return along with the test result

Calculate the value of the test statistic for the conditional independence test

Description

Calculate the test statistic in the test for conditional independence between the first two variables in the data set, given the remaining variables.

Usage

ci_test_statistic(
  lg_object,
  h = function(x) x^2,
  S = function(y) rep(T, nrow(y))
)
ci_test_statistic(
  lg_object,
  h = function(x) x^2,
  S = function(y) rep(T, nrow(y))
)

Arguments

`lg_object`	An object of type `lg`, as produced by the `lg_main`-function
`h`	The `h`-function used in the calculation of the test statistic. The default value is `h(x) = x^2`.
`S`	The integration area in the test statistic. Logical function that takes grid points as argument.

The locally Gaussian conditional density estimator

Description

Estimate a conditional density function using locally Gaussian approximations.

Usage

clg(
  lg_object,
  grid = NULL,
  condition = NULL,
  normalization_points = NULL,
  fixed_grid = NULL
)
clg(
  lg_object,
  grid = NULL,
  condition = NULL,
  normalization_points = NULL,
  fixed_grid = NULL
)

Arguments

`lg_object`	An object of type `lg`, as produced by the `lg_main`-function.
`grid`	A matrix of grid points, where we want to evaluate the density estimate. Number of columns must be the same as number of variables in X1.
`condition`	A vector with conditions for the variables that we condition upon. Length of this vector must be the same as the number of variables in X2. The function will throw an error of there is any discrepancy in the dimensions of the `grid`, `condition` and data set.
`normalization_points`	How many grid points for approximating the integral of the density estimate, to use for normalization?
`fixed_grid`	Not used presently.

Details

This function is the conditional version of the locally Gaussian density estimator (LGDE), described in Otneim & Tjøstheim (2018). The function takes as arguments an lg-object as produced by the main lg_main- function, a grid of points where the density estimate should be estimated, and a set of conditions.

The variables must be sorted before they are supplied to this function. It will always assume that the free variables come before the conditioning variables.

Assume that X is a stochastic vector with two components X1 and X2. This function will thus estimate the conditional density of X1 given a specified value of X2.

Value

A list containing the conditional density estimate as well as all the running parameters that has been used. The elements are:

f_est: The estimated conditional density.
c_mean: The estimated local conditional means as defined in equation (10) of Otneim & Tjøstheim (2017).
c_cov: The estimated local conditional covariance matrices as defined in equation (11) of Otneim & Tjøstheim (2017).
x: The data set.
bw: The bandwidth object.
transformed_data: The data transformed to approximate marginal standard normality (if selected).
normalizing_constants: The normalizing constants used to transform data and grid back and forth to the marginal standard normality scale, as seen in eq. (8) of Otneim & Tjøstheim (2017) (if selected).
grid: The grid where the estimation was performed, on the original scale.
transformed_grid: The grid where the estimation was performed, on the marginal standard normal scale.
normalization_points Number of grid points used to approximate the integral of the density estimate, in order to normalize?
normalization_constant If approximated, the integral of the non-normalized density estimate. NA if not normalized.
density_normalized Logical, indicates whether the final density estimate (contained in f_est) has been approximately normalized to have unit integral.

References

Otneim, Håkon, and Dag Tjøstheim. "Conditional density estimation using the local Gaussian correlation" Statistics and Computing 28, no. 2 (2018): 303-321.

Examples

  # A 3 variate example
  x <- cbind(rnorm(100), rnorm(100), rnorm(100))

  # Generate the lg-object with default settings
  lg_object <- lg_main(x)

  # Estimate the conditional density of X1|X2 = 0, X3 = 1 on a small grid
  cond_dens <- clg(lg_object, grid = matrix(-4:4, ncol = 1), condition = c(0, 1))

# A 3 variate example
  x <- cbind(rnorm(100), rnorm(100), rnorm(100))

  # Generate the lg-object with default settings
  lg_object <- lg_main(x)

  # Estimate the conditional density of X1|X2 = 0, X3 = 1 on a small grid
  cond_dens <- clg(lg_object, grid = matrix(-4:4, ncol = 1), condition = c(0, 1))

Test for financial contagion

Description

Test for financial contagion by means of the local Gaussian correlation.

Usage

cont_test(
  lg_object_nc,
  lg_object_c,
  grid_range = quantile(rbind(lg_object_nc$x, lg_object_c$x), c(0.05, 0.95)),
  grid_length = 30,
  n_rep = 1000,
  weight = function(y) {     rep(1, nrow(y)) }
)
cont_test(
  lg_object_nc,
  lg_object_c,
  grid_range = quantile(rbind(lg_object_nc$x, lg_object_c$x), c(0.05, 0.95)),
  grid_length = 30,
  n_rep = 1000,
  weight = function(y) {     rep(1, nrow(y)) }
)

Arguments

`lg_object_nc`	An object of type `lg`, as produced by the `lg_main`-function for the observations covering the non-crisis period. The data must be two dimensional.
`lg_object_c`	An object of type `lg`, as produced by the `lg_main`-function for the observations covering the crisis period. The data must be two dimensional.
`grid_range`	This test measures the local correlations a long the diagonal specified by this vector of length two.
`grid_length`	The number of grid points.
`n_rep`	The number of bootstrap replicates.
`weight`	Weight function

Details

This function is an implementation of the test for financial contagion developed by Støve, Tjøstheim and Hufthammer (2013). They test whether the local correlations between two financial time series are different before and during crisis times. The distinction between crisis and non-crisis times must be made by the user.

Value

A list containing the test result as well as various parameters. The elements are:

observed The observed value of the test statistic.
replicated The replicated values of the test statistic.
p_value The p-value of the test.
local_correlations The local correlations measured along the diagonal, for the non-crisis and crisis periods respectively.

References

Støve, Bård, Dag Tjøstheim, and Karl Ove Hufthammer. "Using local Gaussian correlation in a nonlinear re-examination of financial contagion." Journal of Empirical Finance 25 (2014): 62-82.

Examples


   # Run the test on some built-in stock data
   data(EuStockMarkets)
   x <- apply(EuStockMarkets, 2, function(x) diff(log(x)))[, 1:2]

   # Define the crisis and non-crisis periods (arbitrarily for this simple
   # example)
   non_crisis <- x[1:100, ]
   crisis     <- x[101:200, ]

   # Create the lg-objects, with parameters that match the applications in the
   # original publication describibg the test
   lg_object_nc <- lg_main(non_crisis, est_method = "5par",
                           transform_to_marginal_normality = FALSE)
   lg_object_c  <- lg_main(crisis, est_method = "5par",
                           transform_to_marginal_normality = FALSE)

   ## Not run: 
   # Run the test (with very few resamples for illustration)
   test_result <- cont_test(lg_object_nc, lg_object_c,
                            n_rep = 10)
   
## End(Not run)

# Run the test on some built-in stock data
   data(EuStockMarkets)
   x <- apply(EuStockMarkets, 2, function(x) diff(log(x)))[, 1:2]

   # Define the crisis and non-crisis periods (arbitrarily for this simple
   # example)
   non_crisis <- x[1:100, ]
   crisis     <- x[101:200, ]

   # Create the lg-objects, with parameters that match the applications in the
   # original publication describibg the test
   lg_object_nc <- lg_main(non_crisis, est_method = "5par",
                           transform_to_marginal_normality = FALSE)
   lg_object_c  <- lg_main(crisis, est_method = "5par",
                           transform_to_marginal_normality = FALSE)

   ## Not run: 
   # Run the test (with very few resamples for illustration)
   test_result <- cont_test(lg_object_nc, lg_object_c,
                            n_rep = 10)
   
## End(Not run)

Plot local correlation maps

Description

Plot the estimated local correlation map (or local partial correlation map) for a pair of variables

Usage

corplot(
  dlg_object,
  pair = 1,
  gaussian_scale = FALSE,
  plot_colormap = TRUE,
  plot_obs = FALSE,
  plot_labels = TRUE,
  plot_legend = FALSE,
  plot_thres = 0,
  alpha_tile = 0.8,
  alpha_point = 0.8,
  low_color = "blue",
  high_color = "red",
  break_int = 0.2,
  label_size = 3,
  font_family = "sans",
  point_size = NULL,
  xlim = NULL,
  ylim = NULL,
  xlab = NULL,
  ylab = NULL,
  rholab = NULL,
  main = NULL,
  subtitle = NULL
)
corplot(
  dlg_object,
  pair = 1,
  gaussian_scale = FALSE,
  plot_colormap = TRUE,
  plot_obs = FALSE,
  plot_labels = TRUE,
  plot_legend = FALSE,
  plot_thres = 0,
  alpha_tile = 0.8,
  alpha_point = 0.8,
  low_color = "blue",
  high_color = "red",
  break_int = 0.2,
  label_size = 3,
  font_family = "sans",
  point_size = NULL,
  xlim = NULL,
  ylim = NULL,
  xlab = NULL,
  ylab = NULL,
  rholab = NULL,
  main = NULL,
  subtitle = NULL
)

Arguments

`dlg_object`	The density estimation object produced by the dlg-function
`pair`	Integer indicating which pair of variables you want to plot. The function looks up the corresponding variables in the bandwidth object used to calculate the dlg object, and you can inspect this in `dlg_object$bw$joint`. Defaults to 1 (the first pair, usually variable 1 against variable 2).
`gaussian_scale`	Logical, if `TRUE` the plot is produced on the marginal standard Gaussian scale.
`plot_colormap`	Logical, if `TRUE` the plot includes a colormap to visualize the value of the local correlation.
`plot_obs`	Logical, if `TRUE` the observations are plotted.
`plot_labels`	Logical, if `TRUE` character labels with local correlation values are plotted.
`plot_legend`	Logical, if `TRUE` a color legend is plotted.
`plot_thres`	A number between 0 and 1 indicating the threshold value to be used for not plotting the estimated local correlation in areas with no data. Uses a quick bivariate kernel density estimate a criterion, and skips plotting in areas with kernel density estimate less than the fraction plot_thres of the maximum density estimate. If 0 (default), everything is plotted, if 1 nothing is plotted. Typical values may be in the 0.001-0.01-range.
`alpha_tile`	The alpha-value indicating the transparency of the color tiles. Number between 0 (transparent) and 1 (not transparent).
`alpha_point`	he alpha-value indicating the transparency of the observations. Number between 0 (transparent) and 1 (not transparent).
`low_color`	The color corresponding to correlation equal to -1 (default: blue).
`high_color`	The color corresponding to correlation equal to 1 (default: red).
`break_int`	Break interval in the color gradient.
`label_size`	Size of text labels, if plotted.
`font_family`	Font family used for text labels, if plotted.
`point_size`	Size of points used for plotting the observations.
`xlim`	x-limits
`ylim`	y-limits
`xlab`	x-label
`ylab`	y-label
`rholab`	Label for the legend, if plotted
`main`	Title of plot
`subtitle`	Subtitle of plot

Details

This function plots a map of estimated local Gaussian correlations of a specified pair (defaults to the first pair) of variables as produced by the dlg-function. This plot is heavily inspired by the local correlation plots produced by the 'localgauss'-package by Berentsen et. al (2014), but it is here more easily customized and specially adapted to the ecosystem within the lg-package. The plotting is carried out using the ggplot2-package (Wickham, 2009). This function now also accepts objects created by the partial_cor()-function, in order to create local partial correlation maps.

References

Berentsen, G. D., Kleppe, T. S., & Tjøstheim, D. (2014). Introducing localgauss, an R package for estimating and visualizing local Gaussian correlation. Journal of Statistical Software, 56(1), 1-18.

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009.

The locally Gaussian density estimator (LGDE)

Description

Estimate a multivariate density function using locally Gaussian approximations

Usage

dlg(
  lg_object,
  grid,
  level = 0.95,
  normalization_points = NULL,
  bootstrap = F,
  B = 500
)
dlg(
  lg_object,
  grid,
  level = 0.95,
  normalization_points = NULL,
  bootstrap = F,
  B = 500
)

Arguments

`lg_object`	An object of type `lg`, as produced by the `lg_main`-function.
`grid`	A matrix of grid points, where we want to evaluate the density estimate.
`level`	Specify a level if asymptotic standard deviations and confidence intervals should be returned.
`normalization_points`	How many grid points for approximating the integral of the density estimate, to use for normalization?
`bootstrap`	Calculate bootstrapped confidence intervals instead.
`B`	Number of bootstrap replications if using bootstrapped confidence intervals.

Details

This function does multivariate density estimation using the locally Gaussian density estimator (LGDE), that was introduced by Otneim & Tjøstheim (2017). The function takes as arguments an lg-object as produced by the main lg_main-function (where all the running parameters are specified), and a grid of points where the density estimate should be estimated.

Value

A list containing the density estimate as well as all the running parameters that has been used. The elements are:

f_est: The estimated multivariate density.
loc_mean: The estimated local means if est_method is "5par" or "5par_marginals_fixed", a matrix of zeros if est_method is "1par".
loc_sd: The estimated local st. deviations if est_method is "5par" or "5par_marginals_fixed", a matrix of ones if est_method is "1par".
loc_cor: Matrix of estimated local correlations, one column for each pair of variables, in the same order as specified in the bandwidth object.
x: The data set.
bw: The bandwidth object.
transformed_data: The data transformed to approximate marginal standard normality.
normalizing_constants: The normalizing constants used to transform data and grid back and forth to the marginal standard normality scale, as seen in eq. (8) of Otneim & Tjøstheim (2017).
grid: The grid where the estimation was performed, on the original scale.
transformed_grid: The grid where the estimation was performed, on the marginal standard normal scale.
normalization_points Number of grid points used to approximate the integral of the density estimate, in order to normalize?
normalization_constant If approximated, the integral of the non-normalized density estimate. NA if not normalized.
density_normalized Logical, indicates whether the final density estimate (contained in f_est) has been approximately normalized to have unit integral.
loc_cor_sd Estimated asymptotic standard deviation for the local correlations.
loc_cor_lower Lower confidence limit based on the asymptotic standard deviation.
loc_cor_upper Upper confidence limit based on the asymptotic standard deviation.

References

Otneim, Håkon, and Dag Tjøstheim. "The locally gaussian density estimator for multivariate data." Statistics and Computing 27, no. 6 (2017): 1595-1616.

Examples

   x <- cbind(rnorm(100), rnorm(100), rnorm(100))
   lg_object <- lg_main(x)  # Put all the running parameters in here.
   grid <- cbind(seq(-4, 4, 1), seq(-4, 4, 1), seq(-4, 4, 1))
   density_estimate <- dlg(lg_object, grid = grid)

x <- cbind(rnorm(100), rnorm(100), rnorm(100))
   lg_object <- lg_main(x)  # Put all the running parameters in here.
   grid <- cbind(seq(-4, 4, 1), seq(-4, 4, 1), seq(-4, 4, 1))
   density_estimate <- dlg(lg_object, grid = grid)

Bivariate density estimation

Description

dlg_bivariate returns the locally Gaussian density estimate of a bivariate distribution on a given grid.

Usage

dlg_bivariate(
  x,
  eval_points = NA,
  grid_size = 15,
  bw = c(1, 1),
  est_method = "1par",
  tol = .Machine$double.eps^0.25/10^4,
  run_checks = TRUE,
  marginal_estimates = NA,
  bw_marginal = NA
)
dlg_bivariate(
  x,
  eval_points = NA,
  grid_size = 15,
  bw = c(1, 1),
  est_method = "1par",
  tol = .Machine$double.eps^0.25/10^4,
  run_checks = TRUE,
  marginal_estimates = NA,
  bw_marginal = NA
)

Arguments

`x`	The data matrix (or data frame). Must have exactly 2 columns.
`eval_points`	The grid where the density should be estimated. Must have exactly 2 columns.
`grid_size`	If `eval_points` is not supplied, then the function will create a suitable grid diagonally through the data, with this many grid points.
`bw`	The two bandwidths, a numeric vector of length 2.
`est_method`	The estimation method, must either be "1par" for estimation with just the local correlation, or "5par" for a full locally Gaussian fit with all 5 parameters.
`tol`	The numerical tolerance to be used in the optimization. Only applicable in the 1-parameter optimization.
`run_checks`	Logical. Should sanity checks be run on the arguments? Useful to disable this when doing cross-validation for example.
`marginal_estimates`	Provide the marginal estimates here if estimation method is "`5par_marginals_fixed`", and the marginal estimates have already been found. Useful for cross-validation. List with two elements as returned by `dlg_marginal_wrapper`.
`bw_marginal`	Vector of bandwidths used to estimate the marginal distributions.

Details

This function serves as the backbone in the body of methods concerning local Gaussian correlation. It takes a bivariate data set, x, and a bivariate set of grid points eval_points, and returns the bivariate, locally Gaussian density estimate in these points. We also need a vector of bandwidths, bw, with two elements, and an estimation method est_method

Value

A list including the data set $x, the grid $eval_points, the bandwidths $bw, as well as a matrix of the estimated parameter estimates $par_est and the estimated bivariate density $f_est.

Examples

  x <- cbind(rnorm(100), rnorm(100))
  bw <- c(1, 1)
  eval_points <- cbind(seq(-4, 4, 1), seq(-4, 4, 1))

  estimate <- dlg_bivariate(x, eval_points = eval_points, bw = bw)

x <- cbind(rnorm(100), rnorm(100))
  bw <- c(1, 1)
  eval_points <- cbind(seq(-4, 4, 1), seq(-4, 4, 1))

  estimate <- dlg_bivariate(x, eval_points = eval_points, bw = bw)

Marginal density estimation

Description

Function that estimates a univariate density estimation by local Gaussian approximations, as described in Hufthammer and Tjøstheim (2009).

Usage

dlg_marginal(
  x,
  bw = 1,
  eval_points = seq(quantile(x, 0.01), quantile(x, 0.99), length.out = grid_size),
  grid_size = 15
)
dlg_marginal(
  x,
  bw = 1,
  eval_points = seq(quantile(x, 0.01), quantile(x, 0.99), length.out = grid_size),
  grid_size = 15
)

Arguments

`x`	The data vector.
`bw`	The bandwidth (a single number).
`eval_points`	The grid where we want to evaluate the density. Chosen suitably if not provided, with length equal to grid_size.
`grid_size`	Number of grid points if grid is not provided.

Details

This function is mainly mean to be used as a tool in multivariate analysis as away to obtain the estimate of a univariate (marginal) density function, but it can of course be used in general to estimate univariate densities.

Value

A list including the data set $x, the grid $eval_points, the bandwidth $bw, as well as a matrix of the estimated parameter estimates $par_est and the estimated bivariate density $f_est.

References

Hufthammer, Karl Ove, and Dag Tjøstheim. "Local Gaussian Likelihood and Local Gaussian Correlation" PhD Thesis of Karl Ove Hufthammer, University of Bergen, 2009.

Examples

  x <- rnorm(100)
  estimate <- dlg_marginal(x, bw = 1, eval_points = -4:4)

x <- rnorm(100)
  estimate <- dlg_marginal(x, bw = 1, eval_points = -4:4)

Marginal estimates for multivariate data

Description

Estimates the marginal locally Gaussian parameters for a multivariate data set

Usage

dlg_marginal_wrapper(data_matrix, eval_matrix, bw_vector)
dlg_marginal_wrapper(data_matrix, eval_matrix, bw_vector)

Arguments

`data_matrix`	The matrix of data points. One column constitutes an observation vector.
`eval_matrix`	The matrix of evaluation points. One column constitutes a vector of grid points.
`bw_vector`	The vector of bandwidths, one element per component.

Details

This function takes in a matrix of observations, a matrix of evaluation points and a vector of bandwidths, and does a locally Gaussian fit on each of the marginals using the dlg_bivariate-function. This function assumes that the data and evaluation points are organized column-wise in matrices, and that the bandwidth is found in the corresponding element in the bandwidth matrix. The primary use for this function is multivariate density estimation using the "5par_marginals_fixed"-method.

Value

A list with marginal parameter and density estimates as provided by the dlg_bivariate-function. One element per column in the data.

Examples

  data_matrix <- cbind(rnorm(100), rnorm(100))
  eval_matrix <- cbind(seq(-4, 4, 1), seq(-4, 4, 1))
  bw <- c(1, 1)

  estimate <- dlg_marginal_wrapper(data_matrix, eval_matrix = eval_matrix, bw = bw)

data_matrix <- cbind(rnorm(100), rnorm(100))
  eval_matrix <- cbind(seq(-4, 4, 1), seq(-4, 4, 1))
  bw <- c(1, 1)

  estimate <- dlg_marginal_wrapper(data_matrix, eval_matrix = eval_matrix, bw = bw)

Trivariate density estimation

Description

dlg_trivariate returns the locally Gaussian density estimate of a trivariate distribution on a given grid.

Usage

dlg_trivariate(
  x,
  eval_points = NULL,
  grid_size = 15,
  bw = c(1, 1, 1),
  est_method = "trivariate",
  run_checks = TRUE
)
dlg_trivariate(
  x,
  eval_points = NULL,
  grid_size = 15,
  bw = c(1, 1, 1),
  est_method = "trivariate",
  run_checks = TRUE
)

Arguments

`x`	The data matrix (or data frame). Must have exactly 2 columns.
`eval_points`	The grid where the density should be estimated. Must have exactly 2 columns.
`grid_size`	If `eval_points` is not supplied, then the function will create a suitable grid diagonally through the data, with this many grid points.
`bw`	The two bandwidths, a numeric vector of length 2.
`est_method`	The estimation method, must either be "1par" for estimation with just the local correlation, or "5par" for a full locally Gaussian fit with all 5 parameters.
`run_checks`	Logical. Should sanity checks be run on the arguments? Useful to disable this when doing cross-validation for example.

Details

In some applications it may be desired to produce a full locally Gaussian fit of a trivariate density function without having to resort to bivariate approximations. This function takes a trivariate data set, x, and a trivariate set of grid points eval_points, and returns the trivariate, locally Gaussian density estimate in these points. We also need a vector of bandwidths, bw, with three elements, and an estimation method est_method, which in this case is fixed at "trivariate", and included only to be fully compatible with the other methods in this package.

This function will only work on the marginally standard normal scale! Please use the wrapper function dlg() for density estimation. This will ensure that all parameters have proper values.

Value

A list including the data set $x, the grid $eval_points, the bandwidths $bw, as well as a matrix of the estimated parameter estimates $par_est and the estimated bivariate density $f_est.

Examples

  x <- cbind(rnorm(100), rnorm(100), rnorm(100))
  bw <- c(1, 1, 1)
  eval_points <- cbind(seq(-4, 4, 1), seq(-4, 4, 1), seq(-4, 4, 1))

  estimate <- dlg_trivariate(x, eval_points = eval_points, bw = bw)

x <- cbind(rnorm(100), rnorm(100), rnorm(100))
  bw <- c(1, 1, 1)
  eval_points <- cbind(seq(-4, 4, 1), seq(-4, 4, 1), seq(-4, 4, 1))

  estimate <- dlg_trivariate(x, eval_points = eval_points, bw = bw)

Wrapper for `dmvnorm`

Description

dmvnorm_wrapper is a function that evaluates the bivariate normal distribution in a matrix of evaluation points, with local parameters.

Usage

dmvnorm_wrapper(
  eval_points,
  mu_1 = rep(0, nrow(eval_points)),
  mu_2 = rep(0, nrow(eval_points)),
  sig_1 = rep(1, nrow(eval_points)),
  sig_2 = rep(1, nrow(eval_points)),
  rho = rep(0, nrow(eval_points)),
  run_checks = TRUE
)
dmvnorm_wrapper(
  eval_points,
  mu_1 = rep(0, nrow(eval_points)),
  mu_2 = rep(0, nrow(eval_points)),
  sig_1 = rep(1, nrow(eval_points)),
  sig_2 = rep(1, nrow(eval_points)),
  rho = rep(0, nrow(eval_points)),
  run_checks = TRUE
)

Arguments

`eval_points`	A `kx2` matrix with evaluation points
`mu_1`	The first expectation vector
`mu_2`	The second expectation vector
`sig_1`	The first standard deviation vector
`sig_2`	The second standard deviation vector
`rho`	The correlation vector
`run_checks`	Run sanity check for the arguments

Details

This functions takes as arguments a matrix of grid points, and vectors of parameter values, and returns the bivariate normal density at these points, with these parameter values.

Wrapper for `dmvnorm` - single point

Description

Function that evaluates the bivariate normal in a single point

Usage

dmvnorm_wrapper_single(x1, x2, mu_1, mu_2, sig_1, sig_2, rho)
dmvnorm_wrapper_single(x1, x2, mu_1, mu_2, sig_1, sig_2, rho)

Arguments

`x1`	The first component of the evaluation point
`x2`	The second component of the evaluation point
`mu_1`	The first expectation
`mu_2`	The second expectation
`sig_1`	The first standard deviation
`sig_2`	The second standard deviation
`rho`	The correlation

Auxiliary function for calculating the asymptotic standard deviations for the local Gaussian correlations

Description

Auxiliary function for calculating the asymptotic standard deviations for the local Gaussian correlations

Usage

gradient(sigma, sigma_k)
gradient(sigma, sigma_k)

Arguments

`sigma`	sigma
`sigma_k`	sigma_k

Independence tests

Description

Independence tests based on the local Gaussian correlation

Usage

ind_test(
  lg_object,
  h = function(x) x^2,
  S = function(y) as.logical(rep(1, nrow(y))),
  bootstrap_type = "plain",
  block_length = NULL,
  n_rep = 1000
)
ind_test(
  lg_object,
  h = function(x) x^2,
  S = function(y) as.logical(rep(1, nrow(y))),
  bootstrap_type = "plain",
  block_length = NULL,
  n_rep = 1000
)

Arguments

`lg_object`	An object of type `lg`, as produced by the `lg_main`-function. The data must be two dimensional.
`h`	The `h`-function used in the calculation of the test statistic. The default value is `h(x) = x^2`.
`S`	The integration area for the test statistic. Must be a logical function that accepts an n x 2 matrix and returns TRUE if a row is in S.
`bootstrap_type`	The bootstrap method. Choose "plain" for the ordinary nonparametric bootstrap valid for independence test for iid data and for serial dependence within a time series. Choose "stationary" or "block" for a test for cross dependence between two time series.
`block_length`	Block length if using block bootstrap for the cross dependence test. Calculated by `np::b.star()` if not supplied.
`n_rep`	Number of bootstrap replications.

Details

Implementation of three independence tests: For iid data (Berentsen et al., 2014), for serial dependence within a time series (Lacal and Tjøstheim, 2017a), and for serial cross-dependence between two time series (Lacal and Tjøstheim, 2017b). The first test has a different theoretical foundation than the latter two, but the implementations are similar and differ only in the bootstrap procedure. For the time series applications, the user must lag the series to his/her convenience before making the lg_object and calling this function.

Value

A list containing the test result as well as various parameters. The elements are:

lg_object The lg-object supplied by the user.
observed The observed value of the test statistic.
replicated The replicated values of the test statistic.
bootstrap_type The bootstrap type.
block_length The block length used for the block bootstrap.
p_value The p-value of the test.

References

Berentsen, Geir Drage, and Dag Tjøstheim. "Recognizing and visualizing departures from independence in bivariate data using local Gaussian correlation." Statistics and Computing 24.5 (2014): 785-801.

Lacal, Virginia, and Dag Tjøstheim. "Local Gaussian autocorrelation and tests for serial independence." Journal of Time Series Analysis 38.1 (2017a): 51-71.

Lacal, Virginia, and Dag Tjøstheim. "Estimating and testing nonlinear local dependence between two time series." Journal of Business & Economic Statistics just-accepted (2017b).

Examples


    # Remember to increase the number of bootstrap samplesin preactical
    # implementations.

    ## Not run: 

    # Test for independence between two vectors, iid data.
    x1 <- cbind(rnorm(100), rnorm(100))
    lg_object1 <- lg_main(x1)
    test_result1 = ind_test(lg_object1,
                            bootstrap_type = "plain",
                            n_rep = 20)

    # Test for serial dependence in time series, lag 1
    data(EuStockMarkets)
    logreturns <- apply(EuStockMarkets, 2, function(x) diff(log(x)))
    x2 <- cbind(logreturns[1:100,1], logreturns[2:101, 1])
    lg_object2 <- lg_main(x2)
    test_result2 = ind_test(lg_object2,
                            bootstrap_type = "plain",
                            n_rep = 20)

    # Test for cross-dependence, lag 1
    x3 <- cbind(logreturns[1:100,1], logreturns[2:101, 2])
    lg_object3 <- lg_main(x3)
    test_result3 = ind_test(lg_object3,
                            bootstrap_type = "block",
                            n_rep = 20)
    
## End(Not run)

# Remember to increase the number of bootstrap samplesin preactical
    # implementations.

    ## Not run: 

    # Test for independence between two vectors, iid data.
    x1 <- cbind(rnorm(100), rnorm(100))
    lg_object1 <- lg_main(x1)
    test_result1 = ind_test(lg_object1,
                            bootstrap_type = "plain",
                            n_rep = 20)

    # Test for serial dependence in time series, lag 1
    data(EuStockMarkets)
    logreturns <- apply(EuStockMarkets, 2, function(x) diff(log(x)))
    x2 <- cbind(logreturns[1:100,1], logreturns[2:101, 1])
    lg_object2 <- lg_main(x2)
    test_result2 = ind_test(lg_object2,
                            bootstrap_type = "plain",
                            n_rep = 20)

    # Test for cross-dependence, lag 1
    x3 <- cbind(logreturns[1:100,1], logreturns[2:101, 2])
    lg_object3 <- lg_main(x3)
    test_result3 = ind_test(lg_object3,
                            bootstrap_type = "block",
                            n_rep = 20)
    
## End(Not run)

Function that calculates the test statistic in the independence tests.

Description

This is an auxiliary function used by the independence tests.

Usage

ind_teststat(x_replicated, lg_object, S, h)
ind_teststat(x_replicated, lg_object, S, h)

Arguments

`x_replicated`	A sample.
`lg_object`	An lg-object.
`S`	Integration area, see `?ind_test`.
`h`	h-function for test statistic, see `?ind_test`.

Interpolate a univariate conditional density function

Description

Estimates the conditional density function for one free variable on a grid. Returns a function that interpolates between these grid points so that it can be evaluated more quickly, without new optimizations.

Usage

interpolate_conditional_density(
  lg_object,
  condition,
  nodes,
  extend = 0.3,
  gaussian_scale = lg_object$transform_to_marginal_normality
)
interpolate_conditional_density(
  lg_object,
  condition,
  nodes,
  extend = 0.3,
  gaussian_scale = lg_object$transform_to_marginal_normality
)

Arguments

`lg_object`	An object of type `lg`, as produced by the `lg_main`-function
`condition`	A vector with conditions for the variables that we condition upon. Must have exactly one more element than there are columns in the data
`nodes`	Either the number of equidistant nodes to generate, or a vector of nodes supplied by the user
`extend`	How far to extend the grid beyond the extreme data points, in share of the range
`gaussian_scale`	Stay on the standard Gaussian scale, useful for the accept-reject algorithm

`lg`: A package for calculating the local Gaussian correlation in multivariate applications.

Description

The lg package provides implementations for the multivariate density estimation and the conditional density estimation methods using local Gaussian correlation as presented in Otneim & Tjøstheim (2017) and Otneim & Tjøstheim (2018).

Details

The main function is called lg_main, and takes as argument a data set (represented by a matrix or data frame) as well as various (optional) configurations that is described in detail in the articles mentioned above, and in the documentation of this package. In particular, this function will calculate the bandwidths used for estimation, using either a plugin estimate (default), or a cross validation estimate. If x is the data set, then the following line of code will create an lg object using the default configuration, that can be used for density estimation afterwards:

lg_object <- lg_main(x)

You can change estimation method, bandwidth selection method and other parameters by using the arguments of the lg_main function.

You can evaluate the multivariate density estimate on a grid as described in Otneim & Tjøstheim (2017) using the dlg-function as follows:

dens_est <- dlg(lg_object, grid = grid).

Assuming that the data set has p variables, you can evaluate the conditional density of the p - q first variables (counting from column 1), given the remaining q variables being equal to condition = c(v1, ..., vq), on a grid, by running

conditional_dens_est <- clg(lg_object, grid = grid, condition = condition).

References

Otneim, Håkon, and Dag Tjøstheim. "The locally gaussian density estimator for multivariate data." Statistics and Computing 27, no. 6 (2017): 1595-1616.

Otneim, Håkon, and Dag Tjøstheim. "Conditional density estimation using the local Gaussian correlation" Statistics and Computing 28, no. 2 (2018): 303-321.

Create an `lg` object

Description

Create an lg-object, that can be used to estimate local Gaussian correlations, unconditional and conditional densities, local partial correlation and for testing purposes.

Usage

lg_main(
  x,
  bw_method = "plugin",
  est_method = "1par",
  transform_to_marginal_normality = TRUE,
  bw = NULL,
  plugin_constant_marginal = 1.75,
  plugin_constant_joint = 1.75,
  plugin_exponent_marginal = -1/5,
  plugin_exponent_joint = -1/6,
  tol_marginal = 10^(-3),
  tol_joint = 10^(-3)
)
lg_main(
  x,
  bw_method = "plugin",
  est_method = "1par",
  transform_to_marginal_normality = TRUE,
  bw = NULL,
  plugin_constant_marginal = 1.75,
  plugin_constant_joint = 1.75,
  plugin_exponent_marginal = -1/5,
  plugin_exponent_joint = -1/6,
  tol_marginal = 10^(-3),
  tol_joint = 10^(-3)
)

Arguments

`x`	A matrix or data frame with data, on column per variable, one row per observation.
`bw_method`	The method used for bandwidth selection. Must be either `"cv"` (cross-validation, slow, but accurate) or `"plugin"` (fast, but crude).
`est_method`	The estimation method, must be either "1par", "5par", "5par_marginals_fixed" or "trivariate". (see details).
`transform_to_marginal_normality`	Logical, `TRUE` if we want to transform our data to marginal standard normality. This is assumed by method "1par", but can of course be skipped using this argument if it has been done already.
`bw`	Bandwidth object if it has already been calculated.
`plugin_constant_marginal`	The constant `c` in `cn^a` used for finding the plugin bandwidth for locally Gaussian marginal density estimates, which we need if estimation method is "5par_marginals_fixed".
`plugin_constant_joint`	The constant `c` in `cn^a` used for finding the plugin bandwidth for estimating the pairwise local Gaussian correlation between two variables.
`plugin_exponent_marginal`	The constant `a` in `cn^a` used for finding the plugin bandwidth for locally Gaussian marginal density estimates, which we need if estimation method is "5par_marginals_fixed".
`plugin_exponent_joint`	The constant `a` in `cn^a` used for finding the plugin bandwidth for estimating the pairwise local Gaussian correlation between two variables.
`tol_marginal`	The absolute tolerance in the optimization for finding the marginal bandwidths, passed on to the `optim`-function.
`tol_joint`	The absolute tolerance in the optimization for finding the joint bandwidths. Passed on to the `optim`-function.

Details

This is the main function in the package. It lets the user supply a data set and set a number of options, which is then used to prepare an lg object that can be supplied to other functions in the package, such as dlg (density estimation), clg (conditional density estimation). The details has been laid out in Otneim & Tjøstheim (2017) and Otneim & Tjøstheim (2018).

The papers mentioned above deal with the estimation of multivariate density functions and conditional density functions. The idea is to fit a multivariate Normal locally to the unknown density function by first transforming the data to marginal standard normality, and then estimate the local correlations pairwise. The local means and local standard deviations are held fixed and constantly equal to 0 and 1 respectively to reflect the knowledge that the marginals are approximately standard normal. Use est_method = "1par" for this strategy, which means that we only estimate one local parameter (the correlation) for each pair, and note that this method requires marginally standard normal data. If est_method = "1par" and transform_to_marginal_normality = FALSE the function will throw a warning. It might be okay though, if you know that the data are marginally standard normal already.

The second option is est_method = "5par_marginals_fixed" which is more flexible than "1par". This method will estimate univariate local Gaussian fits to each marginal, thus producing local estimates of the local means: $\mu_i(x_i)$ and $\sigma_i(x_i)$ that will be held fixed in the next step when the pairwise local correlations are estimated. This method can in many situations provide a better fit, even if the marginals are standard normal. It also opens up for creating a multivariate locally Gaussian fit to any density without having to transform the marginals if you for some reason want to avoid that.

The third option is est_method = "5par", which is a full nonparametric locally Gaussian fit of a bivariate density as laid out and used by Tjøstheim & Hufthammer (2013) and others. This is simply a wrapper for the localgauss-package by Berentsen et.al. (2014).

A recent option is described by Otneim and Tjøstheim (2019), who allow a full trivariate fit to a three dimensional data set that is transformed to marginal standard normality in the context of their test for conditional independence (see ?ci_test for details), but this can of course be used as an option to estimate three-variate density functions as well.

References

Berentsen, Geir Drage, Tore Selland Kleppe, and Dag Tjøstheim. "Introducing localgauss, an R package for estimating and visualizing local Gaussian correlation." Journal of Statistical Software 56.1 (2014): 1-18.

Hufthammer, Karl Ove, and Dag Tjøstheim. "Local Gaussian Likelihood and Local Gaussian Correlation" PhD Thesis of Karl Ove Hufthammer, University of Bergen, 2009.

Otneim, Håkon, and Dag Tjøstheim. "The locally gaussian density estimator for multivariate data." Statistics and Computing 27, no. 6 (2017): 1595-1616.

Otneim, Håkon, and Dag Tjøstheim. "Conditional density estimation using the local Gaussian correlation" Statistics and Computing 28, no. 2 (2018): 303-321.

Otneim, Håkon, and Dag Tjøstheim. "The local Gaussian partial correlation" Working paper (2019).

Tjøstheim, D., & Hufthammer, K. O. (2013). Local Gaussian correlation: a new measure of dependence. Journal of Econometrics, 172(1), 33-48.

Examples

  x <- cbind(rnorm(100), rnorm(100), rnorm(100))

  # Quick example
  lg_object1 <- lg_main(x, bw_method = "plugin", est_method = "1par")

  # In the simulation experiments in Otneim & TjÃ¸stheim (2017a),
  # the cross-validation bandwidth selection is used:
  ## Not run: 
  lg_object2 <- lg_main(x, bw_method = "cv", est_method = "1par")
  
## End(Not run)

  # If you do not wish to transform the data to standard normality,
  # use the five parameter fit:
  lg_object3 <- lg_main(x, est_method = "5par_marginals_fixed",
                  transform_to_marginal_normality = FALSE)

  # In the bivariate case, you can use the full nonparametric fit:
  x_biv <- cbind(rnorm(100), rnorm(100))
  lg_object4 <- lg_main(x_biv, est_method = "5par",
                  transform_to_marginal_normality = FALSE)

  # Whichever method you choose, the lg-object can now be passed on
  # to the dlg- or clg-functions for evaluation of the density or
  # conditional density estimate. Control the grid with the grid
  # argument.
  grid1 <- x[1:10,]
  dens_est <- dlg(lg_object1, grid = grid1)

  # The conditional density of X1 given X2 = 1 and X2 = 0:
  grid2 <- matrix(-3:3, ncol = 1)
  c_dens_est <- clg(lg_object1, grid = grid2, condition = c(1, 0))

x <- cbind(rnorm(100), rnorm(100), rnorm(100))

  # Quick example
  lg_object1 <- lg_main(x, bw_method = "plugin", est_method = "1par")

  # In the simulation experiments in Otneim & TjÃ¸stheim (2017a),
  # the cross-validation bandwidth selection is used:
  ## Not run: 
  lg_object2 <- lg_main(x, bw_method = "cv", est_method = "1par")
  
## End(Not run)

  # If you do not wish to transform the data to standard normality,
  # use the five parameter fit:
  lg_object3 <- lg_main(x, est_method = "5par_marginals_fixed",
                  transform_to_marginal_normality = FALSE)

  # In the bivariate case, you can use the full nonparametric fit:
  x_biv <- cbind(rnorm(100), rnorm(100))
  lg_object4 <- lg_main(x_biv, est_method = "5par",
                  transform_to_marginal_normality = FALSE)

  # Whichever method you choose, the lg-object can now be passed on
  # to the dlg- or clg-functions for evaluation of the density or
  # conditional density estimate. Control the grid with the grid
  # argument.
  grid1 <- x[1:10,]
  dens_est <- dlg(lg_object1, grid = grid1)

  # The conditional density of X1 given X2 = 1 and X2 = 0:
  grid2 <- matrix(-3:3, ncol = 1)
  c_dens_est <- clg(lg_object1, grid = grid2, condition = c(1, 0))

Calculate the local conditional covariance between two variables

Description

Wrapper for the clg function that extracts the local Gaussian conditional covariance between two variables from an object that is produced by the clg-function.

Usage

local_conditional_covariance(clg_object, coord = c(1, 2))
local_conditional_covariance(clg_object, coord = c(1, 2))

Arguments

`clg_object`	The object produced by the clg-function
`coord`	The variables for which the conditional covariance should be extracted

Details

This function is a wrapper for the clag-function, and extracts the estimated local conditional covariance between the first two variables in the data matrix, on the grid specified to the clg-function.

Auxiliary function for calculating the asymptotic standard deviations for the local Gaussian correlations

Description

Auxiliary function for calculating the asymptotic standard deviations for the local Gaussian correlations

Usage

make_C(r, pairs, p)
make_C(r, pairs, p)

Arguments

`r`	r
`pairs`	pairs
`p`	p

Evaluate the multivariate normal

Description

Function that evaluates the multivariate normal distribution with local parameters

Usage

mvnorm_eval(eval_points, loc_mean, loc_sd, loc_cor, pairs)
mvnorm_eval(eval_points, loc_mean, loc_sd, loc_cor, pairs)

Arguments

`eval_points`	A matrix of grid points
`loc_mean`	A matrix of local means, one row per grid point, one column per component
`loc_sd`	A matrix of local standard deviations, one row per grid point, one column per component
`loc_cor`	A matrix of local correlations, one row per grid point, on column per pair of variables
`pairs`	A data frame specifying the components that make up each pair,

Details

Takes in a grid, where we want to evaluate the multivariate normal, and in each grid point we have a new set of parameters.

Calculate the local Gaussian partial correlation

Description

A function that calculates the local Gaussian partial correlation for a pair of variables, given the values of some conditioning variables.

Usage

partial_cor(lg_object, grid = NULL, condition = NULL, level = NULL)
partial_cor(lg_object, grid = NULL, condition = NULL, level = NULL)

Arguments

`lg_object`	An object of type `lg`, as produced by the `lg_main`-function.
`grid`	A matrix of grid points, where we want to evaluate the density estimate. Number of columns must be equal to 2.
`condition`	A vector with conditions for the variables that we condition upon. Length of this vector must be the same as the number of variables in X3. The function will throw an error of there is any discrepancy in the dimensions of the `grid`, `condition` and data set.
`level`	Specify a level if asymptotic standard deviations and confidence intervals should be returned. If not, set to `NULL`.

Details

This function is a wrapper for the clg-function (for conditional density estimation) that returns the local conditional, or partial, correlations described by Otneim & Tjøstheim (2018). The function takes as arguments an lg-object as produced by the main lg_main- function, a grid of points where the density estimate should be estimated, and a set of conditions.

The variables must be sorted before they are supplied to this function. It will always assume that the free variables come before the conditioning variables, see ?clg for details.

Assume that X is a stochastic vector with scalar components X1 and X2, and a possibly d-dimensional component X3. This function will thus compute the local *partial* correlation between X1 and X2 given X3 = x3.

Value

A list containing the local partial Gaussian correlations as well as all the running parameters that has been used. The elements are:

grid The grid where the estimation was performed, on the original scale.
partial_correlations The estimated local partial Gaussian correlations.
cond_density The estimated conditional density of X1 and X2 given X3, as described by Otneim & Tjøstheim (2018).
transformed_grid: The grid where the estimation was performed, on the marginal standard normal scale.
bw: The bandwidth object.
partial_correlations_sd Estimated standard deviations of the local partial Gaussian correlations, as described in a forthcoming paper.
partial_correlations_lower Lower confidence limit based on the asymptotic standard deviation.
partial_correlations_upper Upper confidence limit based on the asymptotic standard deviation.

References

Otneim, Håkon, and Dag Tjøstheim. "Conditional density estimation using the local Gaussian correlation" Statistics and Computing 28, no. 2 (2018): 303-321.

Examples

  # A 3 variate example
  x <- cbind(rnorm(100), rnorm(100), rnorm(100))

  # Generate the lg-object with default settings
  lg_object <- lg_main(x)

  # Estimate the local partial Gaussian correlation between X1 and X2 given X3 = 1 on
  # a small grid
  partial_correlations <- partial_cor(lg_object,
                                     grid = cbind(-4:4, -4:4),
                                     condition = 1)

# A 3 variate example
  x <- cbind(rnorm(100), rnorm(100), rnorm(100))

  # Generate the lg-object with default settings
  lg_object <- lg_main(x)

  # Estimate the local partial Gaussian correlation between X1 and X2 given X3 = 1 on
  # a small grid
  partial_correlations <- partial_cor(lg_object,
                                     grid = cbind(-4:4, -4:4),
                                     condition = 1)

Bootstrap replication under the null hypothesis

Description

Generate bootstrap replicates under the null hypothesis that the first two variables are conditionally independent given the rest of the variables.

Usage

replicate_under_ci(
  lg_object,
  n_rep,
  nodes,
  M = NULL,
  M_sim = 1500,
  M_corr = 1.5,
  n_corr = 1.2,
  extend = 0.3
)
replicate_under_ci(
  lg_object,
  n_rep,
  nodes,
  M = NULL,
  M_sim = 1500,
  M_corr = 1.5,
  n_corr = 1.2,
  extend = 0.3
)

Arguments

`lg_object`	An object of type `lg`, as produced by the `lg_main`-function
`n_rep`	The number of replicated bootstrap samples
`nodes`	Either the number of equidistant nodes to generate, or a vector of nodes supplied by the user
`M`	The value for M in the accept-reject algorithm if already known
`M_sim`	The number of replicates to simulate in order to find a value for M
`M_corr`	Correction factor for M, to be on the safe side
`n_corr`	Correction factor for n_new, so that we mostly will generate enough observations in the first go
`extend`	How far to extend the grid beyond the extreme data points when interpolating, in share of the range

Transform the marginals of a multivariate data set to standard normality based on the logspline density estimator (Kooperberg and Stone, 1991). See Otneim and Tjøstheim (2017) for details.

Description

Transform the marginals of a multivariate data set to standard normality based on the logspline density estimator (Kooperberg and Stone, 1991). See Otneim and Tjøstheim (2017) for details.

Usage

trans_normal(x)
trans_normal(x)

Arguments

`x`	The data matrix, one row per observation.

Value

A list containing the transformed data ($transformed_data), and a function ($trans_new) that can be used to transform grid points and obtain normalizing constants for use in density estimation functions

References

Kooperberg, Charles, and Charles J. Stone. "A study of logspline density estimation." Computational Statistics & Data Analysis 12.3 (1991): 327-347.

Otneim, Håkon, and Dag Tjøstheim. "The locally gaussian density estimator for multivariate data." Statistics and Computing 27, no. 6 (2017): 1595-1616.

Auxiliary function for calculating the local score function u

Description

Auxiliary function for calculating the local score function u

Usage

u(z1, z2, rho)
u(z1, z2, rho)

Arguments

`z1`	z1
`z2`	z2
`rho`	rho

Details

This function is used to estimate the asymptotic variance of the estimates.

Package 'lg'

Help Index

Generate sample from a conditional density estimate

Description

Usage

Arguments

Bandwidth selection for local Gaussian correlation.

Description

Usage

Arguments

Details

Value

References

Examples

Cross-validation for bivariate distributions

Description

Usage

Arguments

Details

Value

References

Examples

Cross-validation for trivariate distributions

Description

Usage

Arguments

Details

Value

References

Examples

Cross-validation for univariate distributions

Description

Usage

Arguments

Details

Value

References

Examples

Plugin bandwidth selection for multivariate data

Description

Usage

Arguments

Details

Value

Examples

Plugin bandwidth selection for univariate data

Description

Usage

Arguments

Details

Value

Examples

Create simple bandwidth object

Description

Usage

Arguments

Details

Examples

Check bandwidth vector

Description

Usage

Arguments

Check bw method

Description

Usage

Arguments

Check bandwidth vector

Description

Usage

Arguments

Check the data and grid

Description

Usage

Arguments

Check the arguments for the dmvnorm_wrapper function

Description

Usage

Arguments

Check estimation method

Description

Check the arguments for the `dmvnorm_wrapper` function