20171221加拿大阿尔伯特大学Dr. Guohui Lin学术报告
发布时间: 2017-12-18 浏览次数: 69

题目:Omics research and a scheduling problem inside

报告人:Dr. Guohui Lin



Dr. Guohui Lin is a tenured professor of Computing Science at the University of Alberta. He obtained his PhD in Theoretical Computer Science from the Chinese Academy of Sciences in 1997, and his MSc in Operations Research and BSc in Mathematics from the Zhejiang University in 1995 and 1993 respectively. He currently serves as the Chair of the Theoretical Computer Science group in the Natural Sciences and Engineering Research Council of Canada.  His main research interests are combinatorial optimization, operations research, approximation algorithm design and analysis and inapproximability, and bioinformatics. He is an Associate Editor of the Journal of Combinatorial Optimization and he chaired the 2007 Annual International Computing and Combinatorics Conference and the 2012 Annual International Conference on Combinatorial Optimization and Applications.  He has been the PI and co-PI for more than 15 major research grants, and has published more than 100 journal papers and more than 80 peer reviewed conference papers.


In our Omics Research lab, we have been processing terabytes of genomic, proteomic and metabolomic data for a number of species including cattle, e.coli and human. I will briefly discuss our research goals that generally fall into large (or big) data analyst, where statistics, machine learning and data mining techniques and internet based data management technology find useful. We are equally interested in optimization problems arising from these practices. In particular, we identify a scheduling problem called the three-machine openshop within the cattle genomic dataset generation procedure. In this problem, a job represents a cow and a machine represents a data processing step; a job needs to be processed non-preemptively by every machine for a certain length of time, in whichever machine order, with the goal to minimize the makespan. The makespan represents the time when the dataset becomes ready for the downsteam data analysis such as genome-wide association studies.

The formulated scheduling problem is weakly NP-hard and it admits a polynomial-time approximation scheme. When each job has an equal processing time across all three machines, the openshop is called proportionate. We first show the important role of the largest job in a linear-time procedure that produces a schedule, which is optimal in many cases; we then propose an algorithm to deal with the rest of the cases using existing approximation algorithms for the minimum and the maximum knapsack problems. Combining them together, we achieve a fully polynomial-time approximation scheme for the three-machine proportionate openshop problem.