Welcome to Seoul, Korea!

Welcome to Seoul, Korea! Seoul National University December 15-16, 2016 The 2nd Pacific Rim Statistical Conference for Production Engineering Produ...
Author: Amberly Brown
23 downloads 0 Views 2MB Size
Welcome to Seoul, Korea!

Seoul National University December 15-16, 2016

The 2nd Pacific Rim Statistical Conference for Production Engineering Production Engineering, Big Data & Statistics 12/14, 2016 Short course on Statistical Process Charts and Big Data International Conference Room, Building 25-1

12/15 – 16, 2016 Conference International Conference Room, Building 25-1 25-210 Building 25 Seoul National University Seoul, Korea Sponsor

SNU Data Science for Knowledge Creation Research Center Seoul National University The Korean Academy of Science and Technology co-sponsors -

International Chinese Statistical Association Korean International Statistical Society Quality & Productivity Section, American Statistical Association Industrial and Business Statistics Section, The Korea Statistical Society The Korean Society for Quality Management 1

Welcome to the 2nd Pacific Rim Statistical Conference for Production Engineering! The Pacific Rim area is one of key manufacturing sites in the world and applications of statistical thinking and methods for production engineering has never been more important with Big Data. To address the need, a statistical conference for production engineering was first proposed by Prof. George Tiao during his opening remarks at 2014 Joint Applied Statistics Symposium of the International Chinese Statistical Association and Korean International Statistical Society in Portland. The first conference was held at Shanghai Center for Mathematical Sciences located in Fudan University in December, 2014, organized by Zhiliang Yang, Tze Leung Lai and others. Internationally renowned speakers and discussants in the fields of statistics and engineering were invited. Following the success of the first conference, the 2nd Pacific Rim Statistical Conference for Production Engineering will feature renowned speakers from statistics and engineering around the region, showcasing their work with Big Data in production engineering. Each session will have ample time for discussion with all participants. We wish all participants enjoy the conference and have a great time.

Thanks for coming to the conference!

2

The conference committee: -

Dongseok Choi (Oregon Health & Science University) Daeheung Jang (Pukyong National University) Tze Leung Lai (Stanford University) Mei-Ling Ting Lee (University of Maryland) Youngjo Lee (Seoul National University) Ying Lu (Stanford University) Jun Ni (University of Michigan) Peter Qian (University of Wisconsin) Peihua Qiu (University of Florida) George Tiao (The University of Chicago)

The science & local committee: -

Suk Joo Bae (Hanyang University) Sangbum Choi (Korea University) Donghwan Lee (Ewha Womans University) Johan Lim (Seoul National University) Maengseok Noh (Pukyong National University) Joong-Ho Won (Seoul National University)

3

Blue arrow: Hoam Faculty House (hotel) Orange arrow: International Conference Room, Building 25-1 Yellow arrow: Jahayon, Building 109 (lunch) Black arrow: Room 210, Building 25

4

Overview 12/14, Wednesday International Conference Room (ICR), Building 25-1 Short Course

9:00 AM - 6:00 PM

12/15, Thursday International Conference Room (ICR), Building 25-1 8:30 - 8:50 AM

Welcome Session

8:50 - 9:30 AM

Keynote Session

9:40 - 11:20 AM

ICR, Building 25-1

Building 25-210

Session 1: Design and Collection of Big Data

Parallel Session 1: Statistical Theories Related with the Big Data Break

11: 20 - 11:40 AM 11:40 AM - 12:40 PM

Discussion for Session 1 Lunch (Jahayon, Building 109)

12:40 - 2:10 PM

2:10 - 3:50 PM

ICR, Building 25-1

Building 25-210

Session 2: Analytic Methods of Big Data

Parallel Session 2: Recent Advances in Statistical Methods Break

3:50 - 4:10 PM 4:10 - 5:10 PM

Discussion for Session 2 Reception/Dinner(Solbatsikdang)

6:00 PM

12/16, Friday ICR, Building 25-1

Building 25-210

9:00 - 10:40 AM

Session 3: Operation/Production Decision Making

Parallel Session 3: Development of Statistical Software in Korea for the Big Data

10: 40 - 11:00 AM

Break

11:00 AM - 12:00 PM

Discussion for Session 3

Parallel Session 4: Big data and Its Application to Industry

Lunch (Jahayon, Building 109)

12:00 - 1:30 PM

1:30 - 3:10 PM

ICR, Building 25-1

Building 25-210

Session 4: Reliability

Parallel Session 5: Quality Monitoring with Big Data Break

3:10 - 3:30 PM

International Conference Room (ICR), Building 25-1 3:30 - 4:30 PM

Discussion for Session 4

4:30 - 4:45 PM

Closing Remarks

4:45 - 5:10 PM

Discussion for Next Conference

6:00 PM

Dinner(Jayunbyulgok)

5

12/15/2016, Thursday Welcome session Time and Location : 8:30 – 8:50 AM, 25-210 Chair: Dongseok Choi, Oregon Health & Science University -

Youngjo Lee, Seoul National University George Tiao, The University of Chicago

Keynote session Time and Location : 8:50 – 9:30 AM, ICR Chair: Daeheung Jang, Pukyong National University

Sung Hyun Park, PhD, Fellow of the American Statistical Association President, Social responsibility and management quality institute, Professor emeritus SNU Title: The Past and Present Quality Management in Korea, and New Quality Innovation by Big Data Management

6

- Session 1: Design and Collection of Big Data Time and Location : 9:40 AM – 11:20 AM, ICR Organizer: Peter Qian (University of Wisconsin) Chair: Ying Lu (Stanford University)  

 

Jae-Kwang Kim, Iowa State University ([email protected]) o Title: Bottom-up estimation and top-down prediction in multi-level models: Solar Energy Prediction combining information from multiple sources Teruo Mori, The Mori Consulting Office, Japan ([email protected]) o Title: The 62 % problems of SN-ratio and new conference-matrix for optimization: To reduce experiment numbers and to increase reliability for optimization Youngdeok Hwang from IBM T. J. Watson research center ([email protected] ) o Title: STATISTICAL-PHYSICAL ESTIMATION OF POLLUTION EMISSION Qing Lu, Wisconsin School of Business, University of Wisconsin - Madison ([email protected]) o Title: Sequential Sampling Enhanced Composite Likelihood Approach to Estimation of Social Intercorrelations in Large-Scale Networks

7

Bottom-up Estimation and Top-down Prediction in Multi-level Models: Solar Energy Prediction Combining Information from Multiple Sources Jae-kwang Kim Iowa State University & Korea Advanced Institute of Science and Technology

Accurately forecasting solar power using a statistical method from multiple sources is an important but challenging problem. Our goal is to combine two different physics model forecasting outputs with real measurements from an automated monitoring network so as to better predict solar power in a timely manner. To this end, we propose a bottom-up approach of analyzing large-scale multilevel models with great computational efficiency requiring minimum monitoring and intervention. This approach features a division of the large scale data set into smaller ones with manageable sizes, based on their physical locations, and fit a local model in each area. The local model estimates are then combined sequentially from the specified multilevel models using our novel bottom-up approach for parameter estimation. The prediction, on the other hand, is implemented in a top-down matter. The proposed method is applied to the solar energy prediction problem for the U.S. Department of Energy's SunShot Initiative.

8

The 62 % Problems of SN-ratio and New ConferenceMatrix for Optimization: To Reduce Experiment Numbers and to Increase Reliability for Optimization Teruo Mori Mori consulting

SN-ratio has been applied for robust optimization to evaluate variation of the noise factors. Robust design has the simple two step process to get the optimum conditions; 1st: combine the best levels of SN ratio for control factors, 2nd: tune to target. So we expected the optimum condition should exceed the top of SN-ratio of the originated-orthogonal array like L18. However 62% of the optimum conditions of the cases of robust designs were lower than the top of them. We called it “62 % problems”. We did to analysis the causes of 62% problems, so finally we could detect 6 items as causes for SN-ratio. The 6 ones will be introduced using the small cases on the paper and presentation. On the other hand, to reduce experiment numbers and to increase reliability for optimization, we have tried to apply the conference C-matrix like C4(2133),C6(2135) etc as design-matrix to optimization. We are getting the successful cases with better results to achieve them. Some of cases have been presented on JSQC (Japan) at 2016. Also, it will be introduced the new conference-matrix and the case studies.

9

Statistical-Physical Estimation of Pollution Emission Youngdeok Hwang IBM T. J. Watson Research Center

Air pollution is driven by non-local dynamics, in which air quality at a site is determined by transport of pollutants from distant pollution emission sources to the site by atmospheric processes. In order to understand the underlying nature of pollution generation, it is crucial to employ a physical knowledge to account for the pollution transport by wind. However, in most cases, it is not possible to utilize the physics models to obtain useful information, as it requires massive calibration and computation. In this paper, we propose a method to estimate the pollution emission from the domain of interest, by using both the physical knowledge and observed data. The proposed method uses an efficient optimization algorithm to estimate the emission from each of the spatial locations, while incorporating the physics knowledge. The proposed approach is demonstrated through a simulation study that mimics the real application.

10

Sequential Sampling Enhanced Composite Likelihood Approach to Estimation of Social Intercorrelations in Large-Scale Networks Qing Liu Wisconsin School of Business, University of Wisconsin - Madison

The increasing access to large social network data has generated substantial interest in the marketing community. However, due to its large scale, traditional analysis methods often become inadequate. In this paper, we propose a sequential sampling enhanced composite likelihood approach for efficient estimation of social intercorrelations in large-scale networks using the spatial model. The proposed approach sequentially takes small samples from the network, and adaptively improves model parameter estimates through learnings obtained from previous samples. In comparison to population-based maximum likelihood estimation that is computationally prohibitive when the network size is large, the proposed approach makes it computationally feasible to analyze large networks and provide efficient estimation of social intercorrelations among members in large networks. In comparison to sample-based estimation that relies on information purely from the sample and produces underestimation bias in social intercorrelation estimates, the proposed approach effectively uses information from the population without compromising computation efficiency. Through simulation studies based on simulated networks and real networks, we demonstrate significant advantages of the proposed approach over benchmark estimation methods and discuss managerial implications. Evaluating the performance of a classification method for a given data structure is a common problem in statistics. This task is often done by using multi-fold cross-validation. We propose an experimental design strategy, called sliced cross-validation, to improve this approach. The proposed method borrows the sliced Latin hypercube design idea, originally developed in computer experiments, to obtain a structured multi-fold cross-validation sample in which the input values in each fold possess attractive space-filling properties. Numerical examples are provided to bear out the effectiveness of the proposed methodology for a catalog of popular classification rules, including linear and quadratic discriminant analysis, naive Bayes, nearest neighbor classifiers, and support vector machines. This is joint work with Peter Qian from the University of Wisconsin-Madison.

11

- Parallel Session 1: Statistical Theories Related with the Big Data Time and Location : 9:40 AM – 11:20 AM, 25-210 Organizer: SNU Data Science for Knowledge Creation Research Center Chair: Sung- im Lee, Dankug University  



Johan Lim, Seoul National University ([email protected]) o Title: B Permutation Based Test on Covariance Separability DongHwan Lee, Ewha Womans University ([email protected]) o Title: Error control in latent class analysis: Application to Korean adolescents with internet and smartphone addiction Chi Tim NG, Chonnam University ([email protected]) o Title: High Dimensional Factor Analysis of Autoregressive Moving Average Time Series Models

12

Permutation Based Test on Covariance Separability Johan Lim Seoul National University

Separability is an attractive feature of covariance matrix or matrix variate data, which can improve and simplify many multivariate procedures. Due to its importance, testing separability has much attention in the literature. The procedures in the literature are based on the likelihood ratio test (LRT) under normality and aim to find a good approximation to its null distribution. In this paper, we propose a new and very different procedure from existing approaches. We propose to rewrite the null hypothesis (the separability of the covariance matrix) into many sub-hypotheses (the separability of the sub-matrices of the covariance matrix), which are testable using a permutation-based procedure. We then combine the testing results of sub-hypotheses using the Bonferroni and multi-stage additive procedures. Our procedure has at least two advantages over the existing LRT under normality. Our procedure is based on the permutation procedure, and inherently distribution free; thus it is robust to nonnormality of the data. In addition, unlike the LR test, it is applicable to the data, whose sample size is smaller than the number of unknown parameters in the covariance matrix. The numerical study and real data examples show these advantages of our procedure over the existing LR test under normality. This is joint work with Seongoh Park, Xinlei Wang, and Sanghan Lee.

13

Error Control in Latent Class Analysis: Application to Korean Adolescents with Internet and Smartphone Addiction DongHwan Lee Ewha Womans University

Error control has been main focus in multiple testing, while the Bayes rule which minimizes the total misclassification error is often used in latent class analysis. Although the primary goal of latent class analysis is to assign a set of data into un-labeled groups, in many applications it is also important to avoid misclassification error of a specific group than the others. Thus, it is desirable to treat misclassification errors for certain groups differently. In this talk, we provide a uniform view for latent class analysis and multiple testing by introducing discrete random-effects model, and study how to define and control various error rates in the latent class analysis. We show how to obtain optimal decision rules controlling these errors properly in latent class analysis via the extended likelihood approach. We applied this procedure to data example of Korean adolescents with internet and smartphone addiction. To classify Korean adolescents into multiple subgroups, we use the level of internet and smartphone addiction in the latent class analysis by controlling proposed error rates and investigate the statistical differences in personality and clinical features such as behavioral inhibition/activation system, depression, anxiety and impulsivity.

14

High Dimensional Factor Analysis of Autoregressive Moving Average Time Series Models Chi Tim NG Chonnam University In this talk, I present new results of likelihood inferences for high-dimensional factor analysis of time series data. In the cases where the latent process is driven by lower dimensional Gaussian vector-valued autoregressive process, a new matrix decomposition technique is introduced to obtain expressions of the likelihood functions and its derivatives. With such expressions, the traditional delta method that relies heavily on score function and Hessian matrix can be extended to high-dimensional cases. Asymptotic theories, including consistency and asymptotic normality are developed. Moreover, an iterative O(np) computational algorithms are developed for estimation. Applications to high-dimensional stock price data and portfolio analysis are discussed.

15

- Session 2: Analytic Methods of Big Data Time and Location : 2:10 PM – 3:50 PM, ICR Organizer: Youngjo Lee (Seoul National University), Daeheung Jang (Pukyong National University) Chair: Daeheung Jang (Pukyong National University)  





Dong Soo Lee, Department of Nuclear Medicine, Seoul National University, Korea ([email protected]) o Title: Clinical use of big data Youngjo Lee, Department of Statistics, Seoul National University, Korea /Dr. Maengseok Noh, Department of Statistics, Pukyong National University, Korea/ Dr. Hee-Joon Bae, Department of Neurology, Seoul National University, Korea ([email protected]) o Title: The Real Time Tracking and Alarming the Early Neurologic Deterioration using Blood Pressure Monitoring in Patient with Acute Ischemic Stroke Joong Ho Won, Department of Statistics, Seoul National University, Korea ([email protected]) o Title: Towards a sparse, scalable, and stably positive definite (inverse) covariance estimator Ching-Kang Ing, Institute of Statistical Science, Academy Sinica, Taiwan ([email protected]) o Title: On high-dimensional cross-validation

16

Clinical Use of Big Data Dong Soo Lee Department of Molecular Medicine and Biopharmaceutical Sciences

Using big omics data and big physiologic data to clinical purposes is on the horizon. Profuse omics data are now available with the cost of less than 1,000$ for whole genome sequencing. We can diagnose idiopathic diseases and their genomic causes and thus prevention of catastrophic events might become possible. Sooner of later, prevention of mortality due to noncommunicable disease will be affordable. Pharrmacogenomics will affect the choice of the drugs for the relatively rare diseases or even very common diseases such as cancer and hypertension. To realize this near future prospects, our society should first allow the production of big omics data, transfer, storage, analysis, generation of hypothesis, and finally analysis for the purpose of clinical usages. Above all these policy issues, the most important step will be the delivery of all these data to the hands of clinicians who have been trained properly to help patients understand the impact of their big data. Thus, our community should invest to the next generation of clinicians to be well prepared to handle these data. Unmet clinical needs will be defined by the clinicians using big omics data and will be solved by the same clinicians. The big omics data include single cell omics data, transcriptomics data, metagenomics data of commensals in the gut and other bodily parts in addition to the mutation calls of cancer patients and tissues. And the clinical usage also include improving life-style for the primary prevention of diseases and expanding well-ness of already healthy subjects other than sick people. Construction of softwareized ultimate digital hospital is also implemented in the globally or domestically leading hospitals and enable the inclusion of immense physiologic signal data for helping patients. The patients need neither explain to the clinicians of their everyday life nor bring data to the clinic. Their in vivo physiologic monitoring data are now available on the cloud computing storage. What the clinician needs now is not the data themselves but the synopsis of the physiologic big data interpreted by artificial intelligence. Preliminary application of clinical pathway, or monitoring of drug interaction and idiopathic side effect are moving forward to adopting the selection of drugs and their doses referring to the genomic constitutions of the patients. This hospital-based strategy of using big data is sure to be expanded to individual clinics or even community. What should be done by the society and government is to make cloud computing available, to enable super-speed wireless network, wearable monitoring hard/software readily affordable. Nextgeneration of medical personnel mainly comprising clinicians are now to be trained, prepared to adopt any technological advances whatever it is, to the benefit of the patients, healthy subjects and finally the people. Thus those people will now be responsible to reach consensus for make these change feasible and desired. It includes, for example, the amendment of using or prohibiting the use of private information of individuals. Each country’s policy will make a difference of the near future of clinical medicine. 17

The Real Time Tracking and Alarming the Early Neurologic Deterioration using Blood Pressure Monitoring in Patient with Acute Ischemic Stroke Youngjo Lee, Maegnseok Noh, Il-Do Ha Department of Statistics, Seoul National University, Seoul, Korea, Department of Statistics, Pukyong National University, Busan, Korea, Department of Statistics, Pukyong National University, Busan, Korea

Approximate 30% of hospitalized patients due to ischemic stroke are placed in the risk of early neurologic deterioration (END) at their hospital stay. END constitutes serious and adverse problems, such as an extension of hospitalization duration, an increasing the demands for more resources, and an aggravation of neurologic disability and death. The real-time monitoring system that predicts and alarms before an occurrence of END would be useful for delineating a patient at high risk in acute stroke care system. Thus, it is useful to develop a real time prediction of END using continuous blood pressure (BP) monitoring and clinical parameters and to set up alarming criterion before END. For this purpose, the joint modelling of mean and dispersion (Lee and Nelder, 2001) for analysis of data from quality-improvement experiments is extended. For modeling for BP changes, we extends hierarchical generalized linear models (HGLMs) with temporal correlations at each time point of BP measurement with irregular intervals after arriving at emergency room. These temporal correlations were used not only for the mean function but also for the variance function. We used 1805 subjects of which 18.3% experienced END. The END patient has higher mean as well as variance in BP than non-END patient. Logistic model was considered for prediction model with covariates such as set up for age, sex, history of stroke, time to arrival, baseline NIHSS score, diabetes mellitus, initial glucose level, atrial fibrillation, leukocyte count, stroke subtypes, recanalization therapy, and the location of symptomatic vessel, the estimated mean and standard deviation of systolic BP at the measured time point. Using this prediction model, 85% of patients got alarmed within 24 hours before the occurrence of END. Key words: Blood pressure, Early neurologic deterioration, Hierarchical generalized linear models, Ischemic stroke, Temporal correlation.

18

Towards a Sparse, Scalable, and Stably Positive Definite (Inverse) Covariance Estimator Joong-Ho Won Department of Statistics, Seoul National University, Seoul, Korea

High dimensional covariance estimation and graphical models is a contemporary topic in statistics and machine learning having widespread applications. An important line of research in this regard is to shrink the extreme spectrum of the covariance matrix estimators. A separate line of research in the literature has considered sparse inverse covariance estimation which in turn gives rise to graphical models. In practice, however, a sparse covariance or inverse covariance matrix which is simultaneously well-conditioned and at the same time computationally tractable is desired. There has been little research at the confluence of these three topics. In this paper we consider imposing a condition number constraint to various types of losses used in covariance and inverse covariance matrix estimation. When the loss function can be decomposed as a sum of an orthogonally invariant function of the estimate and its inner product with a function of the sample covariance matrix, we show that a solution path algorithm can be derived, involving a series of ordinary differential equations. The path algorithm is attractive because it provides the entire family of estimates for all possible values of the condition number bound, at the same computational cost of a single estimate with a fixed upper bound. An important finding is that the proximal operator for the condition number constraint, which turns out to be very useful in regularizing loss functions that are not orthogonally invariant and may yield non-positive-definite estimates, can be efficiently computed by this path algorithm. As a concrete illustration of its practical importance, we develop an operator-splitting algorithm that imposes a guarantee of well-conditioning as well as positive definiteness to recently proposed convex pseudo-likelihood based graphical model selection methods.

19

On High-dimensional Cross-validation Ching-Kang Ing Institute of Statistical Science, Academia Sinica

Cross validation (CV) has been one of the most popular methods for model selection. By splitting n data points into a training sample of size n_{c} and a validation sample of size n_{v} in which n_{v}/n approaches 1 and n_{c} tends to infinity, Shao (1993) showed that subset selection based on CV is consistent in a regression model of p candidate variables with p> n, not only does CV's consistency remain undeveloped, but subset selection is also practically infeasible. Instead of subset selection, in this talk, we suggest using CV as a backward elimination tool for excluding redundant variables that enter regression models through high-dimensional variable screening methods such as LASSO, LARS, ISIS and OGA. By choosing an n_{v} such that n_{v}/n converges to 1 at a rate faster than the one suggested by Shao (1993), we establish the desired consistency property. We further illustrate the finite sample performance of the proposed procedure via Monte Carlo simulations. Moreover, applications of our method to the analysis of wafer yields are also provided.

20

- Parallel Session 2: Recent Advances in Statistical Methods Time and Location : 2:10 PM – 3:50 PM, 25-210 Organizer: Dongseok Choi (Oregon Health & Science University) Chair: Dongseok Choi (Oregon Health & Science University) 

 

Jong-Min Kim, Division of Science and Mathematics, University of Minnesota at Morris ([email protected]) o Title: Control Chart for the Conditional Multivariate Data Using Copula Functions Sungsu Kim, Department of Mathematics, University of Louisiana at Lafayette ([email protected]) o Title: Clustering Methods for Directional Data: A Review Saungbum Choi, Department of Statistics, Korea University, Korea ([email protected]) o Title: A semiparametric inverse-Gaussian model and inference for survival data with a cured proportion

21

Control Chart for the Conditional Multivariate Data Using Copula Functions Jong-Min Kim Statistics Discipline, Division of Science and Mathematics, University of Minnesota at Morris

In this paper we discuss the multivariate statistical process control via control charting. Furthermore, we review multivariate extensions for all kinds of univariate control charts, such as multivariate Shewhart type control charts, multivariate CUSUM control charts and multivariate EWMA control charts. In addition, we consider the conditional multivariate data by utilizing several copula functions and apply the conditional multivariate transformed data to the control charts.

22

Clustering Methods for Directional Data: An Overview and a New Generalization Sungsu Kim Department of Mathematics, University of Louisiana at Lafayette

Recent advances in data acquisition technologies have led to massive amount of data collected routinely in information sciences and technology, as well as engineering sciences. In this big data era, a clustering analysis is a fundamental and crucial step in an attempt to explore structures and patterns in massive datasets, where clustering objects (data) are represented as vectors. Often such high dimensional vectors are L2 normalized so that they lie on the surface of unit hypersphere, transforming them to spherical data. Thus clustering such data is equivalent to grouping spherical data, where either cosine similarity or correlation is a desired metric to identify similar observations, rather that Euclidean similarity metrics. In this talk, an overview of different clustering methods for spherical data in the literature will be presented. An alternative model-based generalization will also be presented.

23

A Semiparametric Inverse-Gaussian Model and Inference for Survival Data with a Cured Proportion Sangbum Choi Department of Statistics, Korea University

This work focuses on a semiparametric analysis of a cure rate modelling approach based on a latent failure process. In clinical and epidemiological studies, a Wiener process with drift may represent a patient’s health status and a clinical endpoint occurs when the process first reaches an adverse threshold state. The first-hitting-time then follows an inverse-Gaussian distribution. On the basis of the improper inverse- Gaussian distribution, we consider a process-based lifetime model that allows for a positive probability of no event taking place in finite time. Model flexibility is achieved by leaving a transformed time measure for disease progression completely unspecified, and regression structures are incorporated into the model by taking the acceleration factor and the threshold parameter as functions of the covariates. When applied to experiments with a cure fraction, this model is compatible with classical twomixture or promotion-time cure rate models. We develop an asymptotically efficient likelihood-based estimation and inference procedure and derive the large-sample properties of the estimators. Simulation studies demonstrate that the proposed method performs well in finite samples. A case study of stage-III soft tissue sarcoma data is used as an illustration.

24

12/16/2016, Friday - Session 3: Operation/Production Decision Making (Room: ICR) Time and Location : 9:00 – 10:40 AM, ICR Organizer: Jun Ni, University of Michigan Chair: Jun Ni, University of Michigan 

  

Dragan Djurdjanovic, Department of Mechanical Engineering, Univ of Texas –Austin ([email protected]) o Title: Condition Monitoring and Operational Decision-Making in Modern Semiconductor Manufacturing Systems Seungchul Lee, Department of System Design and Control ,UNIST ([email protected]) o Title: Machine Learning and Data Visualization in Manufacturing Jun Ni, Director, S.M. Wu. Manufacturing Research Center, University of Michigan ([email protected]) o Title: Intelligent Maintenance Methods for Production Systems Tianrui LI, School of Information Science and Technology (SIST), Southwest Jiaotong University, China ([email protected]) o Title: ST-MVL: A spatio-temporal multi-view learning-based method for missing value imputation

25

Condition Monitoring and Operational Decision-Making in Modern Semiconductor Manufacturing Systems Dragan Djurdjanovic University of Texas –Austin

Modern semiconductor manufacturing tools are often complex systems of numerous interacting subsystems that operate in multiple physical domains and often follow highly nonlinear distributed dynamics. In such systems, traditional condition monitoring methods, which rely on a direct link between sensor readings and the underlying condition of the system, cannot be used. Rather, one must acknowledge that the available sensor readings are only stochastically related to the condition of the monitored system, which therefore must be probabilistically inferred from the sensors. This talk will describe a recently proposed condition monitoring method, based on characterizing the degradation process via a mixture of operation specific Hidden Markov Models (HMMs), with hidden states representing the unobservable degradation states of the monitored system, while its observable variables represent the available sensor readings. The new HMM based monitoring paradigm was applied to monitoring of several tools operating in major semiconductor fabs over multiple months, with orders of magnitude better performance over the traditional, purely signature-based approaches. The remainder of the talk will focus on describing how Markovian models of degradation of flexible manufacturing equipment, such as that utilized in modern semiconductor manufacturing, can be utilized to concurrently optimize the sequence of production operations and schedule preventive maintenance for that machine. It will be shown that integrated decision-making in terms of product sequencing and maintenance operations carries significant potential benefits compared to the more traditional, fragmented decision-making. The lecture will be ended with a brief summary of possible future research directions both in terms of process monitoring and in terms of operational decision-making in modern manufacturing.

26

Machine Learning and Data Visualization in Manufacturing Seungchul Lee Ulsan National Institute of Science and Technology (UNIST)

The nature of manufacturing systems faces ever more complex, dynamic and sometimes even chaotic behaviors. In order to satisfy the demand for high performance and quality in an efficient manner, it is essential to utilize all possible means available. One area, which has shown fast pace developments in terms of not only promising results but also usability, is machine learning. However, many different algorithms, theories, and methods prevent many mechanical and industrial practitioners and researchers from adopting these powerful tools and thus may hinder the utilization of the vast amounts of data increasingly being available from manufacturing process. Here, this talk contributes to demonstrating successful case studies of manufacturing systems with machine learning (supervised, unsupervised, and deep learning). A special focus is on machine condition monitoring and fault diagnosis in a manufacturing environment.

27

Intelligent Maintenance Methods for Production Systems Jun Ni Director, S.M. Wu. Manufacturing Research Center, University of Michigan

Health assessment and health management of engineering assets continue to be a challenging issue in many industries today. Health condition and performance degradation of engineering assets are largely invisible to human users until issues and faults are detected and diagnostic actions follow. Therefore there is a strong need for developing technologies to make such information visible so that appropriate health management activities can be planned ahead before catastrophic failures occur. When smart products and machines are networked and remotely monitored, and when various sources of data are analyzed with sophisticated systems, it is possible to go beyond mere “fail-and-fix” or blindly Preventive Maintenance (PM) to more predictive and proactive maintenance (PdM) – pinpointing exactly which components are likely to fail and when, and autonomously trigger service and other proactive actions. With the goal of achieving self-aware and near zero-breakdown products or systems, the Intelligent Maintenance Systems (IMS) Center has developed various tools and techniques to enhance intelligent health assessment, diagnostics/prognostics and health management capabilities for components, machines, and production systems. This presentation will provide an overview of Intelligent Maintenance Systems and introduce several advanced diagnostics & prognostics tools and health management systems for engineering assets in various applications.

28

ST-MVL: A Spatio-temporal Multi-view Learning-based Method for Missing Value Imputation Tianrui LI School of Information Science and Technology (SIST), Southwest Jiaotong University, China

Many sensors have been deployed in the physical world, generating massive geo-tagged time series data. In reality, readings of sensors are usually lost at various unexpected moments because of sensor or communication errors. Those missing readings do not only affect real-time monitoring but also compromise the performance of further data analysis. In this talk, we will present a spatio-temporal multi-view-based learning (ST-MVL) method to collectively fill missing readings in a collection of geo-sensory time series data, considering 1) the temporal correlation between readings at different timestamps in the same series and 2) the spatial correlation between different time series. Our method combines empirical statistic models, consisting of Inverse Distance Weighting and Simple Exponential Smoothing, with data-driven algorithms, comprised of User-based and Item-based Collaborative Filtering. The former models handle general missing cases based on empirical assumptions derived from history data over a long period, standing for two global views from spatial and temporal perspectives respectively. The latter algorithms deal with special cases where empirical assumptions may not hold, based on recent contexts of data, denoting two local views from spatial and temporal perspectives respectively. The predictions of the four views are aggregated to a final value in a multi-view learning algorithm. We evaluate our method based on Beijing air quality and meteorological data, finding advantages of our model compared with ten baseline approaches. In addition, could you please ask the local organizer to issue me a formal invitation letter for my visa application to attend this conference? The letter need to include the following information.

29

-

Parallel Session 3: Development of Statistical Software in Korea for the Big Data Time and Location : 9:00 – 10:40 AM, 25-210 Organizer: SNU Data Science for Knowledge Creation Research Center Chair: Hyungjoo Kim, Seoul National University 

 

Il do Ha, Pukyong Natinal University ([email protected]) o Title: Analysis of Various Data using a New Statistical Package (Albatross Analytics) Jinwook Seo, Seoul National University ([email protected]) o Title: Interactive Visualizations for Data Analysis in ALBATROSS ANALYTICS Woojoo Lee, Inha University ([email protected]) o Title: Comparing Albatross with existing statistical softwares

30

Analysis of Various Data using a New Statistical Package (Albatross Analytics) Il do Ha Pukyong Natinal University In this talk we introduce a new statistical package, “Albatross Analytics”, which has been recently developed via R-GUI by ⌜Center for Data Science and Knowledge Discovery (Director, Youngjo Lee)⌟. This package provides menu-based frameworks for analyzing data which are collected from various areas such as bio-medicine, finance, social science and engineering, etc; it includes data visualization, basic statistical analysis, multivariate analysis, disease mapping, spatial analysis, time series analysis, survival analysis, quality control and reliability, etc. Furthermore, it also provides the analyses of various statistical models including linear models, generalized linear models, linear mixed models and mean-variance joint models. In particular, this is a very competitive package using powerful computing algorithms based on hierarchical likelihood (hlikelihood; Lee and Nelder, 1996), thereby providing to implement complex random-effect models (e.g. HGLMs, double HGLMs, semi-parametric frailty models, competing-risks models, Joint modelling for longitudinal and survival outcomes), which are difficult to implement such models via existing commercial packages (e.g. SAS or SPSS).

31

Interactive Visualizations for Data Analysis in ALBATROSS ANALYTICS Jinwook Seo Seoul National University

Interactive visualization is a core technology when it comes to an effective data analysis. There are two main opportunities for exploiting visualization in a general data analysis cycle: (1) to study high-level characteristics of data (i.e., variables and its relationships) before in-depth data analysis, and (2) to get insights from (statistical) data analysis results. Widely used statistical computing language R has, however, limitations in supporting interactive visualizations. In this talk, we introduce three key components in ALBATROSS ANALYTICS for effective interactive data analysis: (1) fundamental statistical graphs (histogram, boxplot, and scatterplot) along with interactive visualization techniques (overview, coordination, and dynamic query), (2) interactive map visualization for effectively spatio-temporal data analysis, and (3) visual exploratory data analysis framework called rank-by-feature for interactively finding interesting (often unexpected) features in multidimensional datasets.

32

Comparing Albatross with Existing Statistical Softwares Woojoo Lee Inha University

Statistical analysis softwares such as Stata and SPSS have been commonly used in academia and industry. Recently, Albatross, an integrated solution for data analysis, is developed. In this discussion, we compare Albatross with existing statistical softwares in terms of user friendliness and diversity of statistical models. In conclusion, In order to become a competitive alternative, we discuss which strengths of existing softwares should be reflected in Albatross.

33

- Parallel Session 4: Big Data and Its Application to Industry Time and Location : 10:40 – 12:00 AM, 25-210 Organizer: SNU Data Science for Knowledge Creation Research Center Chair: Hee-seok Oh, Seoul National University    

Yongtae Park , Seoul National University ([email protected]) o Title: Toward a better understanding of weak signal: what weak signal is and how to detect it Jonghun Park, Seoul National University ([email protected]) o Title: Deep Reinforcement Learning for Smart Factory Operation Sungzoon Cho, Seoul National University ([email protected]) o Title: Industry classification and keyword related firm identification through neural language modeling of corporate disclosures Sang Bok Ree, Department of Industrial Management and Systems Engineering ([email protected]) o Title: Study of Noise Level of Taguchi Method

34

Toward a Better Understanding of Weak Signal: What Weak Signal is and How to Detect it Yongtae Park Seoul National University

As not only the world has been more complex, connected and intertwined but also technology innovations have been more influential, discontinuous transformations caused by new technologies are occurred more frequently. In this situation, much attention is paid to weak signal analysis in order to detect technology that would be impactful in the future and to identify new technological opportunities in advance. Since Ansoff defined early indicators of impending impactful events as weak signals in mid-1970’s, lots of researches have been conducted to detect weak signal to find future opportunities. However, a variety of issues regarding weak signal such as definition, characteristic and where weak signals are arisen are still controversial among researchers. In this regard, this paper aims to review several researches about weak signal in order to clarify what weak signal is. Furthermore, we suggest new definition of weak signal that covers previous definitions comprehensively and which measure to use in weak signal detection. This paper is expected to assist weak signal analysis for finding new technological opportunities by helping better understanding of weak signal.

35

Deep Reinforcement Learning for Smart Factory Operation Jonghun Park Seoul National University A framework for applying deep learning to the problem of smart factory operation is presented in this paper. The proposed framework bases on deep reinforcement learning approach, and aims to learn state-based operational decisions for the purpose of maximizing resource utilization. It is then employed to solve the problem of real-time dispatching of an automated semiconductor manufacturing system that requires effective dispatching decisions for lot selection, routing, split, and merge. Furthermore, we also demonstrate the viability of the proposed framework through applying it to the problem of controlling storage and retrieval operations for various types of automated warehousing systems. Experiment results show that the proposed framework works satisfactorily for the considered smart factory environments of real-life scale.

36

Industry Classification and Keyword Related Firm Identification through Neural Language Modeling of Corporate Disclosures Sungzoon Cho Seoul National University Publicly traded firms regularly provide their self-identities in corporate disclosures such as Form 10-Ks. This textual information is usually read by human experts to understand what they do and how they are related to other firms. Now, we propose to employ doc2vec for document embedding to interpret firms. In particular, business sections of each firm’s disclosures as well as vocabulary employed in them are projected to an identical high dimensional space. Two obvious applications are Industry classification and keyword related firm identification. First, automated industry classification saves exhaustive time, human efforts, and subjectivity. Second, firms can be easily identified which are related to certain keywords. In this study, we apply the scheme to more than 700 securities comprising Korea Composite Stock Price Index (KOSPI).

37

Study of Noise Level of Taguchi Method Sangbok Ree Seokyeong University

The problem of determining the level of noise conditions in the Taguchi method depends on each problem. We have found a way to reduce the number of experiments while well reflecting the noise conditions. So far, research on this part has been published as follows. Case of the distribution of noise is known, case of application of Taguchi technique in education, case of experiment of noise condition in paper helicopter, cases of variously in the case of Taguchi application book etc. In this paper, we propose an idea of finding the noise condition in Taguchi method.

38

- Session 4 Reliability and Health Management Time and Location : 1:30 PM – 3:10 PM, ICR Organizer: Mei-Ling Ting Lee, University of Maryland Chair: Sangbum Choi, Korea University    

Peihua Qiu, University of Florida ([email protected]) o Title: Dynamic Screening: An Approach for Detecting Diseases Early Chien-Yu Peng, Academia Sinica, Taiwan ([email protected]) o Title: Degradation Analysis with Measurement Errors Hideki Nagatsuka, Department of Industrial and Systems Engineering, Chuo University, Japan ([email protected]) o Title: Inference for extreme value models and its applications in reliability Mei-Ling Ting Lee, University of Maryland ([email protected]) o Title: From Threshold Regression to a New Family of Shock-and-degradation Failure Models

39

Dynamic Screening: An Approach for Detecting Diseases Early Peihua Qiu Department of Biostatistics, University of Florida

In our daily life, we often need to identify individuals whose longitudinal patterns of certain disease risk factors (e.g., risk factors of stroke) are different from the patterns of those wellfunctioning individuals, so that some unpleasant consequences can be avoided. In many such applications, observations of a given individual are obtained sequentially, and it is desirable to have a screening system to give a signal as soon as possible after that individual's longitudinal pattern starts to deviate from the regular pattern so that some adjustments or interventions can be made in a timely manner. For such applications, the conventional control charts cannot be applied directly because the in-control distribution of the quality variables would change over time. In this talk, we discuss a dynamic screening system proposed recently for that purpose, using statistical process control and longitudinal data analysis techniques. This method is demonstrated using a real-data example about the SHARe Framingham Heart Study. Because big data often take the form of data streams and control charts can handle data streams efficiently, this method can analyze big data effectively. This is a joint research with Drs. Jun Li, Dongdong Xiang and Changliang Zou.

40

Degradation Analysis with Measurement Errors Chien-Yu Peng Academia Sinica, Taiwan

Degradation models are widely used to assess the lifetime information for highly reliable products. When there are measurement errors in monotonic degradation paths, unsuitable model assumptions can lead to contradictions between physical/chemical mechanisms and statistical explanations. To settle the contradiction, this study presents an independent increment degradation-based process that simultaneously considers the unit-to-unit variability, the within-unit variability and the measurement error in the degradation data. Several case studies show the advantages of the proposed models. This work also uses a separation-ofvariables transformation with a quasi-Monte Carlo-type method and parallel computers to estimate the model parameters. A simple model-checking procedure is provided to assess the validity of model assumptions.

41

Inference for Extreme Value Models and Its Applications in Reliability Hideki Nagatsuka Department of Industrial and Systems Engineering, Chuo University, Japan

Statistical modeling of extreme values (largest or smallest values) of certain natural phenomena (e.g., waves, floods, winds, lifetimes etc) is of interest in various practical applications. For example, the distributions of high waves and large floods are important in the designs of dikes and dams, respectively. In the field of reliability, the modeling of extreme values is also important, and various applications such as alloy strength prediction, memory cell failure and so on have been presented in the literature. The traditional approaches to the analysis of extreme values are based on the generalized extreme value distribution (GEVD) and the generalized Pareto distribution (GPD). The former is a limiting distribution for extreme values while the latter is a limiting distribution for the values larger than a given threshold (exceedances over the threshold). The GEVD and GPD are useful for analyzing extreme values since we do not need to know the underlying distributions of data. However, it is well known that these distributions have their moments only for the values of the shape parameter in certain ranges. In addition, they do not satisfy the regularity conditions for maximum likelihood (ML) estimation. This non-regular problem can result in the likelihood function being unbounded and so the ML estimators do not have desirable asymptotic properties, and confidence intervals and the likelihood ratio tests based on the asymptotic properties are not established. I will introduce the new method of parameter estimation proposed by authors. In this method, the estimates always exist uniquely, and the estimators have desirable asymptotic properties such as consistency and asymptotic normality over the entire parameter space. Interval estimation and tests based on this method will be discussed. In addition, some applications of this method based on real data sets in reliability area will be presented.

42

From Threshold Regression to a New Family of Shock-and-degradation Failure Models Mei-Ling Ting Lee Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland

In survival analysis, the Cox regression is well-known. It has, however, a strong proportional hazards assumption. In many engineering and medical contexts, degradation progresses until a failure event (such as death) is triggered when the strength/health level first reaches a failure threshold. I’ll present the Threshold Regression (TR) model that requires few assumptions and, hence, is quite general in its potential application. Many systems experience gradual degradation while simultaneously being exposed to random shocks that eventually cause failure when the shock exceeds the residual strength of the system. This failure mechanism is found in diverse fields of application. A tractable new family of shockand-degradation models will be presented. This family has the attractive feature of defining the failure event as a first passage event and the time to failure as a first hitting time (FHT) of a threshold by an underlying stochastic process. This shock-and-degradation family includes a wide class of underlying degradation processes. We derive the survival function for the shockdegradation process as a convolution of the Fréchet shock process and any candidate degradation process that possesses stationary independent increments. Examples and statistical properties of the survival distribution will be discussed.

43

- Parallel Session 5: Quality Monitoring with Big Data Time and Location : 1:30 PM – 3:10 PM, 25-210 Organizer: Suk Joo Bae (Hanyang University) Chair: Suk Joo Bae, Hanyang University 

 

Chuljin Park, Department of Industrial Engineering, Hanyang University, Korea ([email protected]) o Title: A new nonlinear parameter estimation and its application to critical dimension measurements in a semiconductor industry Jong-Seok Lee, Department of Systems Management Engineering, Sungkyunkwan University, Korea, ( [email protected]) o Title: : Development of parameter-free data mining algorithms for ordinary users Suk Joo Bae, Department of Industrial Engineering, Hanyang University, Korea ([email protected]) o Title: On-line Monitoring Methodology of Equipment Health Using Big-Sized Sensor Data

44

A New Nonlinear Parameter Estimation and Its Application to Critical Dimension Measurements in a Semiconductor Industry Chuljin Park Hanyang University

Parameter estimations of a nonlinear function have been considered in a wide range of applications, such as chemical engineering, mechanical engineering, electronic engineering, and even data science. Traditional approaches for parameter estimations of a nonlinear function use gradient-based optimization algorithms, such as the Gauss-Newton algorithm and the LevenbergMarquardt algorithm. However, such algorithms perform worse when: (1) gradient estimations are inaccurate due to ill-conditioned matrices, (2) the objective function is too flat, and (3) there are many local optima. In this study, we develop a new approach, referred to as the partitionbased parameter estimation, using an optimization via simulation algorithm to overcome such difficulties. The validity of the approach is investigated and the approach is tested on some numerical examples. Then, we also apply the partition-based parameter estimation approach to finding the critical dimension parameter in a semiconductor industry and show the impact of the suggested approach compared with existing approaches.

45

Development of Parameter-Free Data Mining Algorithms for Ordinary Users Jong-Seok Lee Sungkyunkwan University

Most of learning algorithms have their own parameters to be tuned based on given training data. Well-known examples include the number of nearest neighbors (NN) in NN-based algorithms and the window size in Gaussian-kernel-based algorithms. The problem of setting best parameters becomes troublesome when we have some data having different characteristic from what learning algorithms make assumptions on. Imbalanced data classification is a representative example where we have to determine additional parameters such as resampling rates or misclassification costs. Outlier detection can be another area having the same problem of setting additional parameters. In this presentation, we suggest research directions for those areas, aiming at reducing the burden of setting parameters. We believe that this subject is well worth considering because its successful outcomes would help ordinary users to correctly use the learning algorithms for their real-world applications.

46

On-line Monitoring Methodology of Equipment Health Using Big-Sized Sensor Data Suk Joo Bae Hanyang University

Condition-based maintenance (CBM) has paid an increasing attention as a cost-effective maintenance policy by taking maintenance actions only when there is imminent evidence of failure for a monitoring system. The parameters indicating health status of the system are continuously monitored in CBM. This paper reviews recent on-line monitoring methodologies to examine equipment health using large-volume sensor data for maintenance purpose. We provides an example of monitoring a steam turbine generator based on multivariate T^2 chart using linear energy profiles of temperature signals.

47

Suggest Documents