This page looks best with JavaScript enabled

Meeting April 23, 2024

 ·  ☕ 2 min read

Training with simulated data for Machine Learning

By Lorenzo Lope-Uroz

Lorenzo presented his current research on measuring the displacement of volcanic rocks to gain insights into the underlying magma chambers.

He highlighted a critical issue regarding the limited amount of training data available for this study. To address this, he’s using simulations of surface displacements that generate synthetic training data. These simulations can account for atmospheric noise, which can disrupt satellite measurements. These simulations are fast enough to be executed in real-time during the model training process. He also discussed the challenges in accurately representing parameter distributions during training as opposed to their real-world distributions, proposing a method of slicing distributions randomly by quantiles to facilitate the inclusion of new parameters distribution without shape concerns.

A paper on this research is currently in the process of being published.

Spatially constrained online dictionary learning for source separation in hyperspectral data

By Argheesh Bhanot

Argheesh focused on the challenges of source separation in astrophysics using hyperspectral data.

He explained the distinction between pure and mixed pixels; pure pixels, like water, show a single spectrum which is straightforward to analyze, whereas mixed pixels, such as those containing both rock and vegetation, display mixed spectra and are more complex to decipher.

His approach involves adding specific constraints to the loss function and employing alternating minimization on two parameters to tackle this problem. While each parameter’s problem is convex, there’s no definitive proof of convergence to a global minimum, though the method has shown promising practical results.

His data comes from the Hubble Telescope and the MUSE spectrometer on the VLT, which is particularly focused on analyzing distant galaxies in the near-infrared spectrum.
One key objective is to identify Lyman-alpha emission lines to determine galaxy distances. The use of Rafelski’s segmentation maps helps in identifying regions of interest in these images.

There are no automated methods for unmixing overlapping sources => interest of the proposed method!

After analyzing the data cube, there are hidden sources in hyperspectral data that are not in the Rafelski map
They managed to identify residuals leading to the discovery of a new galaxy, cross-validated by another study with a redshift of z=6.

He posed a question on the potential applicability of these techniques to the LSST project.

Next meeting

The date for the next meeting is to be decided. The session will feature Nihal discussing the application of machine learning to the ATLAS project at CERN’s LHC.

Share on

Thomas Vuillaume
WRITTEN BY
Thomas Vuillaume
Data Scientist