API Reference¶

Hi-LASSO model¶

class hi_lasso.hi_lasso.HiLasso(q1='auto', q2='auto', L=30, alpha=0.05, logistic=False, random_state=None, n_jobs=1)[source]¶

Bases: object

Hi-LASSO(High-Demensinal LASSO) is to improve the LASSO solutions for extremely high-dimensional data.

The main contributions of Hi-LASSO are as following:

Rectifying systematic bias introduced by bootstrapping.
Refining the computation for importance scores.
Providing a statistical strategy to determine the number of bootstrapping.
Taking advantage of global oracle property.
Allowing tests of significance for feature selection with appropriate distribution.

Parameters:

q1 ('auto' or int, optional [default='auto']) – The number of predictors to randomly selecting in Procedure 1. When to set ‘auto’, use q1 as number of samples.
q2 ('auto' or int, optional [default='auto']) – The number of predictors to randomly selecting in Procedure 2. When to set ‘auto’, use q2 as number of samples.
L (int [default=30]) – The expected value at least how many times a predictor is selected in a bootstrapping.
alpha (float [default=0.05]) – significance level used for significance test for feature selection
logistic (Boolean [default=False]) – Whether to apply logistic regression model. For classification problem, Hi-LASSO can apply the logistic regression model.
random_state (int or None, optional [default=None]) – If int, random_state is the seed used by the random number generator; If None, the random number generator is the RandomState instance used by np.random.default_rng
n_jobs ('None' or int, optional [default=1]) – The number of jobs to run in parallel. If “n_jobs is None” or “n_jobs == 0” could use the number of CPU cores returned by “multiprocessing.cpu_count()” for automatic parallelization across all available cores.

Variables:

n (int) – number of samples.
p (int) – number of predictors.

Examples

>>> from hi_lasso import HiLasso
>>> model = HiLasso(q1='auto', q2='auto', L=30, logistic=False, random_state=None, n_jobs=1)
>>> model.fit(X, y, sample_weight=None, significance_level=0.05)

>>> model.coef_
>>> model.intercept_
>>> model.p_values_

fit(X, y, sample_weight=None)[source]¶

Fit the model with Procedure 1 and Procedure 2.

Procedure 1: Compute importance scores for predictors.

Procedure 2: Compute coefficients and Select variables.

Parameters:	X (array-like of shape (n_samples, n_predictors)) – predictor variables y (array-like of shape (n_samples,)) – response variables sample_weight (array-like of shape (n_samples,), default=None) – Optional weight vector for observations. If None, then samples are equally weighted.
Variables:	coef (array) – Coefficients of Hi-LASSO. p_values (array) – P-values of each coefficients. intercept (float) – Intercept of Hi-LASSO.
Returns:	self
Return type:	object

Utilities for Hi-LASSO¶

hi_lasso.util.standardization(X, y)[source]¶

The response is mean-corrected and the predictors are standardized

Parameters:	X (array-like of shape (n_samples, n_predictors)) – predictor y (array-like of shape (n_samples,)) – response
Returns:	scaled_X, scaled_y, sd_X
Return type:	np.ndarray