Current research ambitions in genomics are growing, alongside the size of genomic datasets and the quantitative scientists’ big-data analysis capabilities. More and more big projects are being launched, aiming to answer various genomic and biomedical questions that could be as challenging as "how the human genome is transforming medicine". To ensure the success of such projects, the feasibility of the project goals must be evaluated given the resources available, including manpower, data, subject-matter knowledge, and statistical and computational methods.
In this short course, we will concentrate on introducing essential high-dimensional statistical tools, as well as data handling and critical thinking skills, in carrying out a large-scale genomic project. Two specific topics will be covered: 1) RNA-Seq data analysis - from normalization to higher-level analysis such as differential expression and cluster analysis and 2) gene networks - from network edge inference to locating gene functional sub-communities or modules in the network. In introducing these topics, we will additionally discuss steps for bringing qualitative biological and medical knowledge into the formulation of an appropriate statistical inference question, upon which all computation and statistical methodology work will be built.