Cameron, Andrew. Information Compression of Molecular Representations Using Neural Network Auto-Encoders. University of Prince Edward Island, 2017, https://scholar2.islandarchives.ca/islandora/object/ir%3A20892.

Genre

  • Honours
Contributors
Author: Cameron, Andrew
Thesis advisor: Pearson, Jason
Thesis advisor: Lawther, Derek
Date Issued
2017
Publisher
University of Prince Edward Island
Place Published
Charlottetown, PE
Extent
75
Abstract

As quantum chemistry continues to make stronger predictions about the physical observables of chemical systems, and larger datasets of quantum chemical data become public, more and more theorists are employing machine learning algorithms to make the predictions. Machine learning algorithms are limited by the quality of the data they are given. The current molecular representations used to train machine learning algorithms contain unnecessary information which make it difficult for data-driven methods to make accurate predictions. An attempt to overcome this issue was made by compressing molecular representations with neural network auto-encoders. Three representations were evaluated (Cartesian coordinates, Coulomb Matrices, and Position Intracules) using auto-encoders in order to optimize reproducibility. Cartesian coordinates of 134,000 small organic molecules were compressed with an average in-sample reproducibility error of 0.4982 Å per coordinate and an information compression of 39.08%. Coulomb matrices of the same 134,000 molecules were compressed with an average in-sample reproducibility error of 0.2414Å-1 per matrix element and an information compression of 93.94%. Position intracules of 21,271 small organic molecules were compressed with an average in-sample reproducibility error of 0.8926 and an information compression of 29.30%.

Language

  • English

ETD Degree Name

  • Bachelor of Science

ETD Degree Level

  • Bachelor

ETD Degree Discipline

  • Faculty of Science. Honours in Physics.
Degree Grantor
University of Prince Edward Island
Rights
author

Department