Background/Aim Lipidomic and metabolomic techniques become more and more important in human health research. Recent developments in analytical techniques enable the investigation of high amounts of substances. The high numbers of metabolites and lipids that are detected with among others mass spectrometric techniques challenge in most cases the statistical processes to bring out stable and interpretable results. This study targets to use the novel non-established statistical method treelet transform (TT) to investigate high numbers of metabolites and lipids and to compare the results with the established method principal component analysis (PCA). Serum lipid and metabolite profiles are investigated regarding their association to anthropometric parameters associated to obesity. Methods From 226 participants of the EPIC (European Prospective Investigation into Cancer and Nutrition)-Potsdam study blood samples were investigated with an untargeted metabolomics approach regarding serum metabolites and lipids. Additionally, participants were surveyed anthropometrically to assess parameters of obesity, such as body mass index (BMI), waist-to-hip-ratio (WHR) and body fat mass. TT and PCA are used to generate treelet components (TCs) and factors summarizing serum metabolites and lipids in new, latent variables without too much loss of information. With partial correlations TCs and factors were associated to anthropometry under the control for relevant parameters, such as sex and age. Results TT with metabolite variables (p = 121) resulted in 5 stable and interpretable TCs explaining 18.9% of the variance within the data. PCA on the same variables generated 4 quite complex, less easily interpretable factors explaining 37.5% of the variance. TT on lipidomic data (p = 353) produced 3 TCs as well as PCA on the same data resulted in 3 factors; the proportion of explained variance was 17.8% for TT and 39.8% for PCA. In both investigations TT ended up with stable components that are easier to interpret than the factors from the PCA. In general, the generated TCs and factors were similar in their structure when the factors are considered regarding the original variables loading high on them. Both TCs and factors showed associations to anthropometric measures. Conclusions TT is a suitable statistical method to generate summarizing, latent variables in data sets with more variables than observations. In the present investigation it resulted in similar latent variables compared to the established method of PCA. Whereby less variance is explained by the summarizing constructs of TT compared to the factors of PCA, TCs are easier to interpret. Additionally the resulting TCs are quite stable in bootstrap samples.
- treelet transform
- principal component analysis
Foerster, J., Hyötyläinen, T., Oresic, M., Nygren, H., & Boeing, H. (2015). Serum lipid and serum metabolite components in relation to anthropometric parameters in EPIC-Potsdam participants. Metabolism: Clinical and Experimental, 64(10), 1348-1358. https://doi.org/10.1016/j.metabol.2015.07.004