API Reference

Hi-LASSO model

class hi_lasso.hi_lasso.HiLasso(q1='auto', q2='auto', L=30, alpha=0.05, logistic=False, random_state=None, n_jobs=1)[source]

Bases: object

Hi-LASSO(High-Demensinal LASSO) is to improve the LASSO solutions for extremely high-dimensional data.

The main contributions of Hi-LASSO are as following:

  • Rectifying systematic bias introduced by bootstrapping.
  • Refining the computation for importance scores.
  • Providing a statistical strategy to determine the number of bootstrapping.
  • Taking advantage of global oracle property.
  • Allowing tests of significance for feature selection with appropriate distribution.
Parameters:
  • q1 ('auto' or int, optional [default='auto']) – The number of predictors to randomly selecting in Procedure 1. When to set ‘auto’, use q1 as number of samples.
  • q2 ('auto' or int, optional [default='auto']) – The number of predictors to randomly selecting in Procedure 2. When to set ‘auto’, use q2 as number of samples.
  • L (int [default=30]) – The expected value at least how many times a predictor is selected in a bootstrapping.
  • alpha (float [default=0.05]) – significance level used for significance test for feature selection
  • logistic (Boolean [default=False]) – Whether to apply logistic regression model. For classification problem, Hi-LASSO can apply the logistic regression model.
  • random_state (int or None, optional [default=None]) – If int, random_state is the seed used by the random number generator; If None, the random number generator is the RandomState instance used by np.random.default_rng
  • n_jobs ('None' or int, optional [default=1]) – The number of jobs to run in parallel. If “n_jobs is None” or “n_jobs == 0” could use the number of CPU cores returned by “multiprocessing.cpu_count()” for automatic parallelization across all available cores.
Variables:
  • n (int) – number of samples.
  • p (int) – number of predictors.

Examples

>>> from hi_lasso import HiLasso
>>> model = HiLasso(q1='auto', q2='auto', L=30, logistic=False, random_state=None, n_jobs=1)
>>> model.fit(X, y, sample_weight=None, significance_level=0.05)
>>> model.coef_
>>> model.intercept_
>>> model.p_values_
fit(X, y, sample_weight=None)[source]

Fit the model with Procedure 1 and Procedure 2.

Procedure 1: Compute importance scores for predictors.

Procedure 2: Compute coefficients and Select variables.

Parameters:
  • X (array-like of shape (n_samples, n_predictors)) – predictor variables
  • y (array-like of shape (n_samples,)) – response variables
  • sample_weight (array-like of shape (n_samples,), default=None) – Optional weight vector for observations. If None, then samples are equally weighted.
Variables:
  • coef (array) – Coefficients of Hi-LASSO.
  • p_values (array) – P-values of each coefficients.
  • intercept (float) – Intercept of Hi-LASSO.
Returns:

self

Return type:

object

Utilities for Hi-LASSO

hi_lasso.util.standardization(X, y)[source]

The response is mean-corrected and the predictors are standardized

Parameters:
  • X (array-like of shape (n_samples, n_predictors)) – predictor
  • y (array-like of shape (n_samples,)) – response
Returns:

scaled_X, scaled_y, sd_X

Return type:

np.ndarray