Predicting the neutral hydrogen content of galaxies from optical data using machine learning

Rafieferantsoa, Mika; Andrianomena, Sambatra; Dave, Romeel

dc.contributor.author	Rafieferantsoa, Mika
dc.contributor.author	Andrianomena, Sambatra
dc.contributor.author	Dave, Romeel
dc.date.accessioned	2018-09-04T11:58:14Z
dc.date.available	2018-09-04T11:58:14Z
dc.date.issued	2018
dc.identifier.citation	Rafieferantsoa, M. et al. (2018). Predicting the neutral hydrogen content of galaxies from optical data using machine learning. Monthly Notices of the Royal Astronomical Society, 479(4): 4509–4525	en_US
dc.identifier.issn	0035-8711
dc.identifier.uri	http://dx.doi.org/10.1093/mnras/sty1777
dc.identifier.uri	http://hdl.handle.net/10566/4005
dc.description.abstract	We develop a machine learning-based framework to predict the Hi content of galaxies using more straightforwardly observable quantities such as optical photometry and environmental parameters. We train the algorithm on z = 0 - 2 outputs from the Mufasa cosmological hydrodynamic simulation, which includes star formation, feedback, and a heuristic model to quench massive galaxies that yields a reasonable match to a range of survey data including Hi. We employ a variety of machine learning methods (regressors), and quantify their performance using the root mean square error (rmse) and the Pearson correlation coefficient (r). Considering SDSS photometry, 3rd nearest neighbor environment and line of sight peculiar velocities as features, we obtain r > 0:8 accuracy of the Hi-richness prediction, corresponding to rmse< 0:3. Adding near-IR photometry to the features yields some improvement to the prediction. Compared to all the regressors, random forest shows the best performance, with r > 0:9 at z = 0, followed by a Deep Neural Network with r > 0:85. All regressors exhibit a declining performance with increasing redshift, which limits the utility of this approach to z ~<1, and they tend to somewhat over-predict the Hi content of low-Hi galaxies which might be due to Eddington bias in the training sample.We test our approach on the RESOLVE survey data. Training on a subset of RESOLVE, we find that our machine learning method can reasonably well predict the Hi-richness of the remaining RESOLVE data, with rmse~ 0:28. Whenwe train on mock data fromMufasa and test onRESOLVE, this increases to rmse~ 0:45. Our method will be useful for making galaxy-by-galaxy survey predictions and incompleteness corrections for upcoming Hi 21cm surveys such as the LADUMA and MIGHTEE surveys on MeerKAT, over regions where photometry is already available.	en_US
dc.language.iso	en	en_US
dc.publisher	Oxford University Press	en_US
dc.rights	This is the pre-print of the article published online at: http://dx.doi.org/10.1093/mnras/sty1777
dc.subject	Galaxies	en_US
dc.subject	Evolution	en_US
dc.subject	N-body simulations	en_US
dc.subject	Statistics	en_US
dc.title	Predicting the neutral hydrogen content of galaxies from optical data using machine learning	en_US
dc.type	Article	en_US
dc.privacy.showsubmitter	FALSE
dc.status.ispeerreviewed	TRUE
dc.description.accreditation	ISI

Files in this item

Name:: Rafieferantsoa_Predicting-the- ...
Size:: 8.408Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Research Articles (Physics)

Show simple item record