A key bioinformatics task in precision medicine is the precise taxonomy of diseases with genomics data and other types of omics data. Machine learning methods including feature selection from high-dimensional data, supervised classification and unsupervised clustering play major roles in this task.
There have been many such practices on small-scale genomics studies in the past 15 years, which had illustrated the power of machine learning approaches in the field, and also revealed many pitfalls if applied improperly. The recent initiatives on Precision Medicine in the US and some other nations are expected to scale up the data of genomics and other omics studies by several magnitudes. Such data will also be joined by big data from medical practices. The efficient and right application of machine learning methods is crucial for getting the correct knowledge from the big data. On the other hand, the world is evidencing fast development of the machine learning and artificial intelligence fields in recent years. New methods like deep learning and probabilistic learning, among many others, have shown amazing close-to human-level performances in many tasks like image recognition, text mining and the analysis of big data on the internet. These new methods have high potential in applications on big biological data. But biological data have their own unique characteristics. Unlike kits and protocols for well-developed bench experiments, for most advanced machine learning methods, it’s unlikely to achieve reliable new knowledge from big omics data if methods are applied as automated tools without real understanding of the principle behind the methods and without real understanding of the investigated biological question.
This tutorial will introduce the framework of machine learning theories, explain the major principles of classical and newly-emerging machine learning methods, present details of some representative methods and their application examples, and discuss their potentials, open questions and common pitfalls in the application on precision medicine studies.