nmfprofiler package
Submodules
nmfprofiler.nmfprofiler module
NMFProfiler: A multi-omics integration method for samples stratified in groups
- class nmfprofiler.nmfprofiler.NMFProfiler(omic1, omic2, y, params={'alpha': 2, 'eta1': 1.0, 'eta2': 1.0, 'gamma': 0.01, 'lambda': 0.001, 'm_back': 1, 'mu': 0.001, 'sigma': 1e-09}, init_method='random2', solver='analytical', as_sklearn=True, backtrack=False, max_iter_back=100, tol=0.0001, max_iter=1000, seed=None, verbose=False)
Bases:
object
A multi-omics integration method for samples stratified in groups
The goal of the method is to find relationships between OMICS corresponding to typical profiles of distinct groups of individuals. The objective is to find two decompositions, one for each omic, with a common contribution of individuals, in which latent factor matrices are sparse.
The objective function \(\mathcal{F} (\mathbf{W},\mathbf{H}^{(1)},\mathbf{H}^{(2)},\beta^{(1)},\beta^{(2)})\) is as follows:
\[ \begin{align}\begin{aligned}& \dfrac{1}{2}\left( \sum_{j=1}^2\| \mathbf{X}^{(j)} - \mathbf{WH}^{(j)} \|_F^2 \right)\\&+ \dfrac{\gamma}{2}\left( \sum_{j=1}^2\| \mathbf{Y} - \mathbf{X}^{(j)} \mathbf{H}^{(j)\top} \text{Diag}(\beta^{(j)}) \|_F^2 \right)\\&+ \sum_{j=1}^{2} \lambda\|\mathbf{H}^{(j)}\|_1 + \dfrac{\mu}{2}\|\mathbf{W \|_F^2}\end{aligned}\end{align} \]Parameters
- omic1:
array-like of shape (n_samples x n_features_omic1). First omics dataset. (n_samples) is the number of samples and (n_features_omic1) the number of features.
- omic2:
array-like of shape (n_samples x n_features_omic2). Second omics dataset. (n_samples) is the number of samples and (n_features_omic2) is the number of features. WARNING: (omic2) must contain the exact same samples in the same order as (omic1).
- y:
vector of length (n_samples). Group to which each sample belongs (same order than the rows of omic1 and omic2).
- params:
dict of length 8, optional. Contains, in this order, values for hyperparameters gamma, lambda, mu (from the objective function), for eta1, eta2 (when proximal optimization is used), and for alpha, sigma, m_back (for linesearch()). By default, gamma = 1e-2, lambda = 1e-3, mu = 1e-3, eta1 = eta2 = 1, alpha = 2, sigma = 1e-9, and m_back = 1. In the objective function, lambda and gamma are additionally multiplied by (n_samples).
- init_method:
str, optional. Initialization method. One of {‘random2’, ‘random3’, ‘nndsvd2’, ‘nndsvd3s’, ‘nndsvd’, ‘nndsvda’, ‘nndsvdar’}. Initializations are base on the _initialize_nmf function of the sklearn.decomposition.NMF module. In addition, for ‘random2’ and ‘random3’, values are drawn from a standard Normal distribution (with 0 mean and standard deviation equal to 1). By default, init_method = ‘random2’. See _initialize_nmf() for further information.
- solver:
str, optional. Solver type for the optimization problem. One of ‘analytical’ (analytical differentiation) or ‘autograd’ (automatic differentiation). Note the latter solver is not implemented in the current version, but should be released in future versions. By default, solver = ‘analytical’.
- as_sklearn:
boolean, optional. If True, the solver uses MU updates. If False, it uses a proximal optimization strategy. By default, as_sklearn = True.
- backtrack:
boolean, optional. When as_sklearn = False, whether or not to perform Backtrack LineSearch. By default, backtrack = False.
- max_iter_back:
int, optional. When max_iter_back = True, maximum number of iterations for the Backtrack LineSearch. By default, max_iter_back = 100.
- tol:
float, optional. Tolerance for the stopping condition. By default, tol = 1e-4.
- max_iter:
int, optional. Maximum number of allowed iterations. By default, max_iter = 1000.
- seed:
int, optional. Random seed to ensure reproducibility of results. By default, seed = None.
- verbose:
boolean, optional. Verbose optimization process. By default, verbose = False.
Attributes
- W:
ndarray of shape (n_samples x 2). Contributions of individuals in each latent component.
- W_init:
ndarray of shape (n_samples x 2). Initial version of (W).
- H1:
ndarray of shape (2 x n_features_omic1). Latent components for (omic1).
- H1_init:
ndarray of shape (2 x n_features_omic1). Initial version of (H1).
- H2:
ndarray of shape (2 x n_features_omic2). Latent components for (omic2).
- H2_init:
ndarray of shape (2 x n_features_omic2). Initial version of (H2).
- Beta1:
ndarray of shape (2 x 1). Regression coefficients for the projection of individuals from (omic1) onto (H1).
- Beta1_init:
ndarray of shape (2 x 1). Initial version of (Beta1).
- Beta2:
ndarray of shape (K x 1). Regression coefficients for the projection of individuals from (omic2) onto (H2).
- Beta2_init:
ndarray of shape (K x 1). Initial version of (Beta2).
- n_iter:
int. Final number of iterations (up to convergence or maximum number of iterations is reached).
- df_etas:
pd.dataFrame of shape (n_iter+1, 2). Optimal values for parameters eta1 and eta2 at each iteration.
- df_errors:
pd.dataFrame of shape (n_iter+1, 9). All error terms for each iteration and omic j.
- df_ldaperf:
pd.DataFrame of shape (n_iter+1, 13). All metrics linked to LDA at each iteration and omic j.
- df_grads:
pd.DataFrame of shape (n_iter+1, 2) Values of H^(1) and H^(2) gradients before being updated, at each iteration.
- runningtime:
float. Running time of the method measured through process_time().
… : all inputs passed to NMFProfiler().
References
C. Boutsidis and E. Gallopoulos. SVD based initialization: A head start for nonnegative matrix factorization. Pattern Recognition. Volume 41, Issue 4. 2008. Pages 1350-1362. https://doi.org/10.1016/j.patcog.2007.09.010.
J. Leuschner, M. Schmidt, P. Fernsel, D. Lachmund, T. Boskamp, and P. Maass. Supervised non-negative matrix factorization methods for MALDI imaging applications. Bioinformatics. Volume 35. 2019. Pages 1940-1947 https://doi.org/10.1093/bioinformatics/bty909.
S. Zhang, C.-C. Liu, W. Li, H. Shen, P. W. Laird, and X. J. Zhou. Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic acids research. Volume 40, Issue 19. 2012. Pages 9379-9391. https://doi.org/10.1093/nar/gks725.
A. Mercadie, E. Gravier, G. Josse, N. Vialaneix, and C. Brouard. NMFProfiler: A multi-omics integration method for samples stratified in groups. Preprint submitted for publication.
Examples
>>> import numpy as np >>> X1 = np.array([[1, 1.8, 1], >>> [2, 3.2, 1], >>> [1.5, 2.8, 1], >>> [4.1, 0.7, 0.1], >>> [5.01, 0.8, 0.1], >>> [6.2, 0.9, 0.1]]) >>> X2 = np.array([[2, 2.8, 2], >>> [3, 4.2, 2], >>> [2.5, 3.8, 2], >>> [5.1, 1.7, 1.1], >>> [6.01, 1.8, 1.1], >>> [7.2, 1.9, 1.1]]) >>> y = np.array([1, 1, 1, 0, 0, 0]) >>> seed = 240805 >>> from nmfprofiler import NMFProfiler >>> model = NMFProfiler(omic1=X1, omic2=X2, y=y, seed=seed) >>> res = model.fit() >>> print(res) >>> res.heatmap(obj_to_viz="W", height=10, width=10, path="") >>> model.barplot_error(height=6, width=15, path="")
- barplot_error(width, height, path)
Visualize of the final error terms.
Params
- width:
int. Width of the figure (in units by default).
- height:
int. Height of the figure (in units by default).
- path:
str. Location to save the figure.
Values
Return a barplot of the different error terms.
- evolplot(obj_to_check, width, height)
Visualize the evolution of either etas values or gradients along the optimization process.
Params
- obj_to_check:
str. One of {‘etas’, ‘gradients’}.
- width:
int, width of the figure (in units by default).
- height:
int, height of the figure (in units by default).
Values
Return a lineplot.
- fit()
Run NMFProfiler.
- heatmap(obj_to_viz, width, height, path)
Visualize any matrix of X^j, W, H^j with a heatmap.
Params
- obj_to_viz:
str. One of {‘omic1’, ‘omic2’, ‘W’, ‘H1’, ‘H2’}.
- width:
int. Width of the figure (in units by default).
- height:
int. Height of the figure (in units by default).
- path:
str. Location to save the figure.
Values
Returns a heatmap.
- predict(new_ind, verbose=False)
Predict the group of a new sample, based on its projection onto signatures matrices.
Params
- new_ind:
list. List of arrays containing values of features from omic1 and omic2 for a new sample.
Values
- group:
list. Predicted group (one of 0 or 1) for the new sample in each omic.
- proj1:
array. Projection onto H1
- proj2:
array. Projection onto H2