API Reference¶
Hi-LASSO model¶
-
class
hi_lasso.hi_lasso.
HiLasso
(q1='auto', q2='auto', L=30, alpha=0.05, logistic=False, random_state=None, n_jobs=1)[source]¶ Bases:
object
Hi-LASSO(High-Demensinal LASSO) is to improve the LASSO solutions for extremely high-dimensional data.
The main contributions of Hi-LASSO are as following:
- Rectifying systematic bias introduced by bootstrapping.
- Refining the computation for importance scores.
- Providing a statistical strategy to determine the number of bootstrapping.
- Taking advantage of global oracle property.
- Allowing tests of significance for feature selection with appropriate distribution.
Parameters: - q1 ('auto' or int, optional [default='auto']) – The number of predictors to randomly selecting in Procedure 1. When to set ‘auto’, use q1 as number of samples.
- q2 ('auto' or int, optional [default='auto']) – The number of predictors to randomly selecting in Procedure 2. When to set ‘auto’, use q2 as number of samples.
- L (int [default=30]) – The expected value at least how many times a predictor is selected in a bootstrapping.
- alpha (float [default=0.05]) – significance level used for significance test for feature selection
- logistic (Boolean [default=False]) – Whether to apply logistic regression model. For classification problem, Hi-LASSO can apply the logistic regression model.
- random_state (int or None, optional [default=None]) – If int, random_state is the seed used by the random number generator; If None, the random number generator is the RandomState instance used by np.random.default_rng
- n_jobs ('None' or int, optional [default=1]) – The number of jobs to run in parallel. If “n_jobs is None” or “n_jobs == 0” could use the number of CPU cores returned by “multiprocessing.cpu_count()” for automatic parallelization across all available cores.
Variables: - n (int) – number of samples.
- p (int) – number of predictors.
Examples
>>> from hi_lasso import HiLasso >>> model = HiLasso(q1='auto', q2='auto', L=30, logistic=False, random_state=None, n_jobs=1) >>> model.fit(X, y, sample_weight=None, significance_level=0.05)
>>> model.coef_ >>> model.intercept_ >>> model.p_values_
-
fit
(X, y, sample_weight=None)[source]¶ Fit the model with Procedure 1 and Procedure 2.
Procedure 1: Compute importance scores for predictors.
Procedure 2: Compute coefficients and Select variables.
Parameters: - X (array-like of shape (n_samples, n_predictors)) – predictor variables
- y (array-like of shape (n_samples,)) – response variables
- sample_weight (array-like of shape (n_samples,), default=None) – Optional weight vector for observations. If None, then samples are equally weighted.
Variables: - coef (array) – Coefficients of Hi-LASSO.
- p_values (array) – P-values of each coefficients.
- intercept (float) – Intercept of Hi-LASSO.
Returns: self
Return type: object