Classification of ERP Components with Imbalanced Datasets

Classification of ERP Components with Imbalanced Datasets A Comparison of Linear Discriminant Analysis and Logistic Regression Masterarbeit im Studi...
Author: April Gibbs
0 downloads 0 Views 2MB Size
Classification of ERP Components with Imbalanced Datasets A Comparison of Linear Discriminant Analysis and Logistic Regression

Masterarbeit

im Studiengang Computing in the Humanities der Fakultät Wirtschaftsinformatik und Angewandte Informatik der Otto-Friedrich-Universität Bamberg

22. Dezember 2015

Verfasser: Mike Imhof (Matrikelnummer: 150 93 49)

Gutachterin: Prof. Dr. Ute Schmid Professur für Angewandte Informatik insb. Kognitive Systeme

Abstract Linear discriminant analysis (LDA) is a widely used method for single-trial analysis and classification of event-related potential (ERP) components. Although it makes more assumptions about the underlying data compared to other methods, it is recommended to use LDA, even when the requirements are not fulfilled. It is accepted to take a loss in performance in order to gain the advantage of the simplicity of LDA. In this thesis, LDA is compared with linear regression (LR) that also convinces by its simplicity, but makes less assumptions than LDA. Both methods are applied to electroencephalogram data containing an error-related negativity which is an ERP component that occurs during error trials in choice reaction tasks and belongs to the group of error-related potentials (ErrPs). Recent studies have shown that classification of ErrP components improves the performance of systems using a brain-computer interface (BCI). Datasets containing ErrPs components are usually characterized by class imbalance whereas the error trials (targets) are under-represented. Under these conditions, LR yields better results than LDA as a performance evaluation using the receiver operating characteristic shows. Hence, these results indicate that LR should be preferred when having datasets that are imbalanced and contain a target class that is underrepresented.

Contents Abstract

2

Contents

3

List of Figures

5

List of Tables

6

1 Introduction

7

2 Classification of ERP Components

9

2.1

ERP Components . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1

2.2

2.3

9

Error-Related Potentials . . . . . . . . . . . . . . . . . . .

11

Single-Trial Analysis and ERP Classification . . . . . . . . . . . .

13

2.2.1

A State-of-the-Art-Approach for ERP Classification . . . .

13

2.2.2

Detection of Error-related Potentials . . . . . . . . . . . .

15

Classification with Imbalanced Datasets

. . . . . . . . . . . . . .

16

2.3.1

Class Imbalance Problem . . . . . . . . . . . . . . . . . . .

16

2.3.2

Solutions for Imbalanced Learning . . . . . . . . . . . . . .

17

3 Approach for Classifying ERP Components with Imbalanced Datasets 19 3.1

Requirements and the Rationale of the Approach . . . . . . . . .

19

3.2

A Proposal for Classification of Error-related Potentials with Imbalanced Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

3.2.1

Acquisition of Data . . . . . . . . . . . . . . . . . . . . . .

20

3.2.2

Feature Extraction . . . . . . . . . . . . . . . . . . . . . .

21

3.2.3

Linear Classification Techniques . . . . . . . . . . . . . . .

22

Classification Model Evaluation . . . . . . . . . . . . . . . . . . .

25

3.3.1

25

3.3

Estimating the Generalization Error: Evaluation Methods

3

3.3.2 3.3.3

Performance Scoring Functions . . . . . . . . . . . . . . . Applied Evaluation Procedure in this Work . . . . . . . .

4 Implementation 4.1 Data Acquisition . . . . . . . . 4.1.1 Experimental Setup . . . 4.1.2 Data preparation . . . . 4.1.3 Data selection . . . . . . 4.1.4 ERPs . . . . . . . . . . 4.2 Classification Procedure . . . . 4.2.1 Data Preparation . . . . 4.2.2 Feature Selection . . . . 4.2.3 Parameter Optimization 4.3 Evaluation . . . . . . . . . . . . 4.3.1 Results for Classification

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

27 28 29 29 29 32 32 34 34 35 37 40 42 42

5 Conclusion and Future Work

47

6 References

49

7 Appendix 7.1 Performance Results . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Media Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53 53 56

List of Figures 2.1 2.2 2.3

Illustration from Luck (2014) of an example ERP experiment . . . Typical ERP waveforms following correct and erroneous responses Mental typewriter Hex-o-Spell . . . . . . . . . . . . . . . . . . . .

10 12 14

3.1

A visualization of signed r2 -matrix built from EEG recordings of one subject with the two classes correct and incorrect response . .

22

4.1 4.2

Exemplary Flanker stimuli pairs . . . . . . . . . . . . . . . . . . . Procedure of a trial of the adopted Flanker experiment that delivered data for classification learning in this work . . . . . . . . . . 4.3 Numbers of all trials divided into correct and incorrect responses for each participant . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Grand average of the event-related potentials for correct (target) and incorrect responses (non-target) in the Flanker experiment at electrode FCz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Classification procedure . . . . . . . . . . . . . . . . . . . . . . . 4.6 Separability indices for participants with id AHE02F, AVM26E and IGN31N. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Separability indices for participants with id IRL26N, IZR20F and NME11N. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Performance of LDA with shrinkage of the covariance matrix and LR for each participant . . . . . . . . . . . . . . . . . . . . . . . . 4.9 Comparison of averaged performance of all participants for LDA with shrinkage of the covariance matrix and LR . . . . . . . . . . 4.10 ROC curves of LDA with shrinkage of the covariance matrix and LR for each participant . . . . . . . . . . . . . . . . . . . . . . . .

30 31 33

34 36 38 39 43 44 46

5

List of Tables 4.1 4.2 4.3 4.4

7.1

6

List of all chosen participants for classification learning in this work The chosen time intervals for each participant . . . . . . . . . . . Best parameter settings for logistic regression resulting from a grid search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results of classification of LDA with shrinkage of the covariance matrix and LR . . . . . . . . . . . . . . . . . . . . . . . . . . . . Accuracy and AUC values of LDA in combination with shrinkage and LR for each participant and each iteration of a ten-fold crossvalidation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33 37 42 44

54

1 Introduction Understanding brain functions is one of the main challenges in modern neuroscience. Identifying mental states or human intentions by decoding single-trial encephalogram would bring many advantages for fundamental brain research or applications with a brain-computer interface (BCI). A BCI is a system that generates control signals from the intention of a subject only by decoding single-trial electroencephalogram (EEG) without using activity of any muscles or peripheral nerves. Recordings of EEG are non-invasively by means of electrodes placed on the scalp. This branch of research is strongly influenced by the development of interfaces connecting the human brain and a computer. In this context the option to classify single trials in EEG data is very helpful and received much attention. A popular example for an application of a BCI is attention-based typewriting introduced by Farwell and Donchin (1988). Such a system may be used as a communication device by individuals who cannot use any motor system for communication (e.g. patients with locked-in syndrome). But also basic research beyond BCIs can benefit from progress in the domain of single-trial analysis. The investigation of ways to decode an individual’s sensory, cognitive, or affective processes based on encephalogram data can bring advance into the research on brain functions. Decoding these processes enables to read the mind and can help to reveal how they are neurally encoded in the brain. Mental typewriting and other applications are mainly based on the detection of P300 which is a component of the event-related potential (ERP) that is related to attention processes. However, it also makes sense to use other ERPs for the application of a BCI. For instance, a detection of error-related potentials (ErrPs) is known to improve the so-called bitrate of a BCI (Ferrez & Millán, 2005, 2008; Schmidt, Blankertz, & Treder, 2012) which is the ratio of the transferred amount of data to time or the communication speed. Improvements of the bitrate help to increase the ease of use of a BCI since bitrates in this domain are usually very low.

7

CHAPTER 1. INTRODUCTION In order to detect brain signals for communication, a BCI has to learn to recognize components of the EEG. That means that it has to be calibrated before its usage and this procedure should be fast and simple to apply. However, in case of learning ErrP detection, it is difficult to get a sufficient amount of recorded error trials because one characteristic of ErrP data is its sparsity and imbalance. The aim of this thesis is to apply a state-of-the-art method in the domain of ERP classification, a linear discriminant analysis (LDA), to an ErrP dataset and compare the outcome with results of a logistic regression (LR) applied to the same dataset. In contrast to LR, LDA makes assumptions for optimal classification regarding the distribution of the features. It is expected that LDA is not robust enough in terms of meeting these assumptions when learning on small and imbalanced datasets, even in combination with shrinkage of the covariance matrix in case of using many features. LR might yield better results since it is very similar to LDA with respect to its structure and procedure, but does not rely on any assumptions. This work is organized as follows. The second chapter provides a description of the concept of ERP components including an approach for classification of these components as well as an overview of methods for handling imbalanced datasets. The third chapter presents a concrete approach for ERP classification with imbalanced datasets and a procedure to evaluate the comparison of LDA and LR. The implementation of this approach is covered by the fourth chapter. Finally, the fifth chapter draws a conclusion of the findings and provides ideas for future modifications.

8

2 Classification of ERP Components The following sections provide an overview of the nature of event-related potential (ERP) components in general and error-related potentials (ErrPs) in particular. They also provide a description of methods for classification and single-trial analysis of ErrPs. Finally, a description of methods to handle imbalanced datasets completes this chapter.

2.1 ERP Components ERP components are part of the ERP which describes voltage fluctuations in the ongoing electroencephalogram (EEG). These fluctuations are time-locked to an event, e. g. an onset of a stimulus or the execution of a response (see Figure 2.1 C). The ERP waveform appears as a series of positive and negative peaks that vary in polarity, amplitude and duration. It is a depiction of the changes in scalp-recorded voltage over time that reflect the sensory, cognitive, affective, and motor processes triggered by a stimulus. ERPs are usually much smaller than the raw EEG and due to noise in the EEG, they cannot be observed without further processing. Since it is assumed that underlying neurophysiological, cortical processes are the same in repetitions of stimuli, responses or mental states, it is possible to make ERPs visible by averaging many signals recorded under the same conditions (see Figure 2.1 F). In contrast, the background EEG activity is randomly distributed regarding a triggering event when it repeatedly occurs. By averaging many signals this background EEG activity is eliminated and the event-related activity is increased. As the number of EEG segments increases, this signal-to-noise ratio gets better and the ERP signal emerges.

9

CHAPTER 2. CLASSIFICATION OF ERP COMPONENTS

Figure 2.1: An illustration from Luck (2014) of an example ERP experiment. The experiment uses the oddball paradigm where frequent Xs and infrequent Os are presented on a computer monitor while EEG is recorded from several electrodes in combination with ground and reference electrodes (A). The electrode placement is done using the widespread international 10/20 system (B). In panel C only a midline parietal electrode (Pz) is shown. After filtering and amplifying signals from all electrodes, the continuous analog signals are converted into discrete sets of digital samples (D). The stimulus presentation computer also sends event codes that mark the onset time and identity of each stimulus and response. The raw EEG from electrode Pz is shown over a period of 9 s (C). The rectangles show a 90010 ms epoch of EEG, beginning 100 ms before the onset of each stimulus. These segments were lined up with respect to the onset of the stimulus (E). After that separate averages were then computed for the X and O segments (F).

2.1. ERP COMPONENTS The waveform of the observed ERP can be divided into peaks and components. Peaks are reliable local positive or negative maxima of the ERP waveform, whereas non-reliable local maxima result from high-frequency noise. Components depict discrete intracerebral sources of voltage that reflect neurocognitive processes. Luck (2014) defines an ERP component as "a scalp-recorded neural signal that is generated in a specific neuroanatomical module when a specific computational operation is performed" (Luck, 2014, p. 66). There are different ERP components for different cognitive processes that are commonly divided into three main categories: 1. Exogenous sensory components that are elicited by a stimulus 2. Endogenous components that reflect neurocognitive processes dependent on a task 3. Motor components that arises when a motor response is prepared and executed The following section is concerned with a group of endogenous ERP components which occur after an erroneous event has happened. Such an event can be an erroneous response of a subject or an incorrect behaviour from a system the subject interacts with.

2.1.1 Error-Related Potentials Error-related potentials (ErrPs) are a group of ERP components that occur after an erroneous behaviour. They are typically characterized by a negative deflection in the EEG signal with a frontocentrally-maximal distribution. There are differences in ErrPs that depend on the type of erroneous behaviour. The most famous ErrP is often referred to as error-related negativity (ERN) and is followed by a positive deflection over parietal regions that is referred to as error-positivity (PE ). Response ErrPs/Error-related negativity The error-related negativity (ERN) or sometimes error-negativity (NE ) was first discovered by two research groups (Falkenstein, Hohnsbein, Hoormann, & Blanke, 1990; Gehring, Coles, Meyer, & Donchin, 1990), simultaneously. It is an ERP component that occurs during error

11

CHAPTER 2. CLASSIFICATION OF ERP COMPONENTS

Figure 2.2: Typical ERP waveforms after correct and erroneous responses, recorded from an arrow flankers task at electrode FCz in 30 individuals in a study of Riesel, Endrass, Kaufmann, and Kathmann (2011), and the scalp topography of erroneous responses. Note that the ERP is plotted with negative voltages upward.

trials in choice reaction tasks where subjects have to respond to a stimulus by pressing a button. Therefore, it has been thought to represent the activity of a response-monitoring system. Its peak is about 50 ms following an erroneous response. It is followed by the PE which is a larger positive deflection that arises 200-400 ms after the button press. The typical waveform of both components is depicted in Figure 2.2. Feedback ErrP Another ErrP occurs in reinforcement learning tasks when participants perform wrong and receive a feedback indicating the outcome of the response. Whereas the ERN is relative to an erroneous response, the feedbackrelated negativity (FRN) is a negative deflection 250-300 ms following the occurrence of a feedback stimulus (Holroyd & Coles, 2002; Luu, Tucker, Derryberry, Reed, & Poulsen, 2003). Observation ErrP The observations of erroneous responses of other persons can also lead to negative deflections in the ERP which is similar to the FRN (van Schie, Mars, Coles, & Bekkering, 2004). Interaction ErrP In a BCI scenario, an error is mostly caused by a misclassification of the BCI system and not by the user himself or by another person. However, there is an ErrP called Interaction ErrP (Ferrez & Millán, 2008) that arises with an ERN-like component 270 ms as well as a PE -like component 350-450

12

2.2. SINGLE-TRIAL ANALYSIS AND ERP CLASSIFICATION ms following the erroneous input. Ferrez and Millán (2005) found an interaction ErrP in human-robot-interaction and used it to improve the bitrate of the robot control. Other studies also focussed on this kind of ErrP to improve robustness and flexibility of BCI systems (Buttfield, Ferrez, & Millán, 2006; Schmidt et al., 2012; Dal Seno, Matteucci, & Mainardi, 2010). The aim of these works is to detect errors made by the interface which are caused by misinterpretations of the users intention and to subsequently correct these errors. Schmidt et al. (2012) have shown that an online detection of interaction ErrPs can significantly improve the performance of an attention-based typewriting system that uses a BCI.

2.2 Single-Trial Analysis and ERP Classification ERP classification can be an effective means to translate brain activity into commands for an electronic device. Therefore, the BCI community is very interested in the application of single-trial analysis. Although there has been much progress in the past two decades, some issues remain not satisfactorily solved. A fundamental problem in analysis of single-trial responses is the interference of task-relevant signals by task-unrelated brain activities, which is also called low signal-to-noise ratio (SNR) of the observed single-trial responses (Blankertz, Lemm, Treder, Haufe, & Müller, 2011). Furthermore, in most cases the recommended amount of training samples that is needed to properly describe different classes is five to ten times higher than their dimensionality (Raudys & Jain, 1991) which is often referred to as the curse-of-dimensionality. Moreover, there is no general method for classifying ERP components. It always depends on the application and what the system should be able to do. The following sections describe a state-of-the-art approach that can be used for classification of ERP components (see Section 2.2.1 as well as approaches that have recently been used for classification of ErrPs (see Section 2.2.2).

2.2.1 A State-of-the-Art-Approach for ERP Classification There are lots of methods for ERP classification... Hier eine Reihe aufzählen!! In a tutorial for single-trial analysis and classification of ERP components, Blankertz et al. (2011) propose a basic classification algorithm, namely linear discriminant analysis (LDA; for a detailed description see Section 3.2.3), to clas-

13

CHAPTER 2. CLASSIFICATION OF ERP COMPONENTS

Figure 2.3: Mental typewriter Hex-o-Spell that was used by Blankertz, Lemm, Treder, Haufe, and Müller (2011) to demonstrate their ERP classification approach. In this typewriter, symbols can be selected in a two level procedure. Left side of the graphic depicts the first level that consists of groups of 5 symbols. The user has to select one of them to get to the second level which is depicted on the right side. At this level, the symbols of the selected group are distributed to all discs. An empty disc can be used as ’undo’ to return to the group level without selection. The backspace symbol ’ k. There is evidence that in some cases the best method to use for model selection is ten-fold cross-validation, even if computation power allows using leave-one-out cross-validation (Kohavi, 1995). Bootstrapping Instead of analyzing subsets of the complete sample set repeatedly, as it is done during cross-validation, bootstrapping means taking subsamples of the full sample set repeatedly to analyze the performance of a learning model. A subsample is a random sample with replacement from the complete sample set (Efron & Tibshirani, 1993). The question how many subsamples should be used depends

26

3.3. CLASSIFICATION MODEL EVALUATION on the research goal, it might be anywhere from 50 to 2000. In many cases, bootstrapping seems to work better than cross-validation (Efron, 1983).

3.3.2 Performance Scoring Functions To estimate the generalization error or how well a learning model generalizes to unseen data, different performance measures can be used. In the following two sections, two measures that will be used in this work, are described. Accuracy score The accuracy function computes the fraction of correct predictions out of all predictions. If yˆi is the predicted class of the i-th sample and yi is the true class, then the fraction of correct predictions over n samples is defined as accuracy(y, yˆ) =

n 1X 1(ˆ yi = yi ) n i=1

(3.8)

where 1(x) is the indicator function. Receiver operating characteristic Receiver operating characteristic (ROC; Green & Swets, 1966) curve is used to visualize the performance of a binary classifier. It is a visualization of the performance development of a binary classifier or predictive model when its discrimination threshold is varied (see e.g. Zou, O’Malley, & Mauri, 2007). More precisely, it is a plot that illustrates the true positive rate (TPR) as a function of the false positive rate (FPR) at various threshold settings. TPR is the fraction of true positives out of the positives and FPR is the fraction of false positives out of the negatives. A diagonal line in this plot would indicate a performance that is as good as a completely random guess. The curve information can be summarized in one number which is called area under the ROC curve (AUC) value. This value lies somewhere between 0 and 1. The higher the AUC value, the better the performance of the classifier. If the AUC value is 1.0, the binary classifier yields a perfect discrimination between two classes. A completely random guess would yield an AUC value of 0.5. In contrast to the overall accuracy, the ROC curve is insensitive to imbalanced class proportions in the dataset. Therefore, the overall accuracy or error rate

27

CHAPTER 3. APPROACH FOR CLASSIFYING ERP COMPONENTS WITH IMBALANCED DATASETS does not provide adequate information in the case of imbalanced learning (He & Garcia, 2009). In this case, the ROC curve is a more informative assessment metric.

3.3.3 Applied Evaluation Procedure in this Work In this work, a stratified ten-fold cross-validation with random splits in ten iterations is used to evaluate both classification techniques. The cross-validation is repeated ten times because it promises a better estimate of the generalization error than performing the ten-fold cross-validation only once. The iterations are a remedy to prevent an influence of the random splits on the estimation of the generalization error during cross-validation. As performance measures the accuracy score as well as the area under the ROC curve are used. The accuracy score is a widespread evaluation measure whereas the ROC curve is a good candidate for imbalanced datasets. After evaluating both classification techniques, a statistical comparison is conducted by performing a 2 x 10 repeated-measures Analysis of Variance (ANOVA) with the factors classifier (LDA vs. LR) and cross-validation iteration (10 iterations) for both performance measures, separately. A better performance of one classifier should be indicated by a significant main effect classifier.

28

4 Implementation This chapter focuses on the implementation of my approach. The first section describes the data acquisition process. The classification procedure is presented in the second section. The third section completes this chapter by evaluating my approach.

4.1 Data Acquisition In this section the procedure for acquiring a dataset containing ErrPs is described. It provides information about the experimental setup, data preparation and data selection.

4.1.1 Experimental Setup The dataset used for classification in this experiment was taken from a study formerly conducted at the Department of Experimental Psychology at the University of Bamberg (Lugauer, 2015; Otte, 2015).

Participants In this study 54 participants (8 male, 46 female) were selected from a sample of 132 undergraduate psychology students from the University of Bamberg. The selection was based on the expression of conscientiousness. The participants were between 18 and 38 years old (M = 10.93, SD = 3.54). All participants were divided into two groups, one with low and one with high values on conscientiousness scale of the personality inventory NEO-FFI (Borkenau, 2008). In both groups values differ by at least 0.8 from the mean of the test’s global norm.

29

CHAPTER 4. IMPLEMENTATION Congruent: ◄ ◄ ◄ ◄

◄ ◄ ◄ ◄ ◄

Incongruent: ◄ ◄ ◄ ◄

◄ ◄ ► ◄ ◄

Figure 4.1: Exemplary Flanker stimuli pairs as they were used in the experiment. Congruent trials – target triangle is pointing to the left and flanker triangles are pointing to the same direction. Incongruent trials – target triangle is pointing to the right and flanker triangles are pointing to left. Flanker stimuli are usually presented a few milliseconds earlier. The other triangle direction is also possible for both congruent and incongruent trials.

Procedure

The test procedure mainly consists of an adopted version of the Eriksen Flanker Task. In this task participants had to respond to a target triangle that is either pointing to the left or to the right and is located in the center of a monitor and is flanked by non-target stimuli (see Figure 4.1). Flankers were also triangles that either point to the same direction as target stimulus in congruent condition or to the opposite direction in incongruent condition. In this test procedure each trial began with a 1000 ms presentation of a dot in the center of the monitor. Then, an instruction was displayed for additional 1000 ms. It was followed by a fixation cross for 500 ms and flanker stimuli for 50 ms. Finally, the target stimulus was displayed until the participant gave a response. Figure 4.2 exemplerly depicts the procedure of a standard trial. Participants passed through a total of 600 trials in a similar way. All trials were split up into 480 standard trials and 120 trials that belong to one of two other experimental conditions. One is a rule breaking condition where the instruction was Commit an error and another one is a rule changing condition where instruction was Change the rule. Participants of each conscientiousness group were randomly assigned to one of both conditions.

30

4.1. DATA ACQUISITION

1000 ms blank screen Beachte die Regeln 1000 ms instruction

+ 500 ms fixation

◄ ◄ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000000000000000000000000000000000000000000000000000000000000 500 ms 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ◄ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000000000000000000000000000000000000000000000000000000000000 flanker 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ◄ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ◄ stimuli 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ◄ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000000000000000000000000000000000000► 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 50 ms 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ◄ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 target 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ◄ 00000000000000000000000000000000000000000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 stimulus 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Figure 4.2: Procedure of a trial of the adopted Flanker experiment that delivered data for classification learning in this work. At the beginning a blank screen with a dot in the center of the monitor is presented for 1000 ms. After that, the participant is instructed by a short text for 1000 ms and a 500 ms presentation of a fixation cross follows. Finally, following a 50 ms presentation of the flanker stimuli, the target stimulus is presented until the participant responds.

31

CHAPTER 4. IMPLEMENTATION

4.1.2 Data preparation EEG data was recorded throughout the session by means of an EasyCap (EASYCAP GmbH, Germany) electrode system that is based on the 10-20 system. Following electrodes were used: FP1, FP2, Fz, F3, F4, F7, F8, FC5, FC6, FCz, C3, C4, Cz, CP1, CP2, CP5, CP6, T7, T8, TP9, TP10, P3, P4, P7, P8, Pz, O1, O2. Horizontal ocular movements were recorded from electrodes at the outer canthi of both eyes. The electrode AFZ was used as ground and an electrode on the nose tip as reference. The signal was amplified by a BrainAmp amplifier (Brain Products GmbH, Germany) with a sampling rate of 250 Hz and 16 bit/channel. Impedances were below 5kΩ for frontocentral electrodes (Fz, FCz, Cz, Pz) and 10kΩ for all other electrodes. The EEG signal was low-pass filtered at 0.01 Hz and high-pass filtered at 70 Hz and a notch filter with 50 Hz was used. After that, the recorded EEG data were further prepared in BrainVision Analyzer 2.0.1 (Brain Products, Germany). First, the recordings for each participant were divided into response-locked segments. For this purpose, a segment was created for each stimulus-response-pair starting 200 ms before and ending 600 ms after the response. A baseline correction was applied for 200 ms pre-response interval on each segment. Segments with artefacts due to horizontal eye movements or noisy channels as well as segments with amplitudes above 80µV and below −80µV were rejected for further data analysis.

4.1.3 Data selection For the application of my classification approach, I only concentrate on incongruent standard trials because this condition has the highest number of erroneous responses. Trials from rule breaking and rule changing condition were rejected because the tasks in both conditions were different to standard task and could therefore cause differences in EEG data that are task-dependent and disturb the detection of erroneous responses. In addition, all trials from subjects that had less than ten error trials were also rejected for classification learning. After rejection of all trials and subjects not satisfying the criteria just mentioned, trials from six subjects were left (see Table 4.1 and Figure 4.3). Participants from this sample have 193.67 (SD = 21.3) correct and 17.17 (SD = 2.54) incorrect trials on average.

32

4.1. DATA ACQUISITION

Table 4.1: List of all chosen participants taken from the sample of the study of Lugauer (2015) and Otte (2015) and numbers of correct as well as incorrect trials that are used for classification learning in this work

Subject ID

Correct

Incorrect

Total

197 212 196 154 212 191

18 21 12 21 16 15

215 233 208 175 228 206

193.67 (21.3)

17.17 (2.54)

210.83 (20.59)

AHE02F AVM27E IGN31N IRL26N IZR20F NME11N Mean (SD)

Training trial numbers for both classes

250

Correct Incorrect

200

Counts

150

100

50

0

AHE02F

AVM27E

IGN31N IRL26N Subject

IZR20F

NME11N

Figure 4.3: Numbers of all trials divided into correct and incorrect responses for each participant

33

CHAPTER 4. IMPLEMENTATION )&]

WDUJHW 

QRQWDUJHW



@ 9 ˩ >























>PV@

Figure 4.4: Grand average of the event-related potentials for correct (target) and incorrect responses (non-target) in the Flanker experiment at electrode FCz. Note that the ERP is plotted with negative voltages upward. The plot shows the EEG waveform of correct (blue line) and incorrect responses (red line) in a segment of 800 ms starting 200 ms before and ending 600 ms after the response.

4.1.4 ERPs In Figure 4.4, the grand averages of correct and incorrect responses are depicted. The ERP data from all six participants and trials were segmented around the responses and averaged. The waveforms of both classes clearly differ from each other.

4.2 Classification Procedure The EEG records are the basis for the following classification procedure that is depicted in Figure 4.5. The whole procedure aims at getting a classifier for the detection of whether an incoming trial is a correct response or not. First, all relevant EEG records are collected by exporting them from BrainVision Analyzer and saving them to one folder. A python script then reads the containing data and prepares the data in a apecific order so that they can be further pro-

34

4.2. CLASSIFICATION PROCEDURE cessed. After this preparation, the script saves the data into a csv file (for details see Section 4.2.1). The next step is to calculate signed-r2 -values of the pointwise biserial correlation coefficient for each pair of time point and channel. This job is done by a spreadsheet calculation in Microsoft Excel 2013 and a graphical visualization of these information is done by a R (Version 3.2.0; R Core Team, 2015) script. With the graphical visualization one has to inspect where the discriminative information lie and define intervals for the spatio-temporal features. These intervals are the basis for a data aggregation and feature selection by means of a script in IBM SPSS Statistics 23 (for details see Section 4.2.2). The result of the feature selection is a csv file that contains id, class and feature variables of each trial. A Python (Version 2.7) script using the module scikit-learn (Version 0.16; Pedregosa et al., 2011) takes the csv file as input and performs a parameter optimization (for details see Section 4.2.3) for logistic regression. Finally, it applies a logistic regression and a linear discriminant analysis to the data (for results see Section 4.3).

4.2.1 Data Preparation In a first step, all relevant EEG records are collected. Due to data analysis purposes that are specific for the experiment, there are 4 data files for each subject. More precisely, there is one file for each correct or incorrect response as well as for one of two types of incongruent flanker trials (target triangle pointing to the left or right). BrainVision Analyzer (Brain Products, Germany) only supports to export these data files manually via a generic data export. The format of these files is a generic data format in which recordings of electrodes are saved as one line for each electrode. Voltage information for each electrode is saved in segments that are 800 ms long and consist of 200 data points due to a sampling rate of 250 Hz or a sampling interval of 4 ms, respectively. This data format is required to be restructured for creating the separability index and classification learning. Files should be merged, trial numbers and classes should be marked. So in the next step, data files were merged by a python script in order to have one file for each subject containing all trial data. In addition, the data matrix in each file is transposed so that electrodes are column variables and recording data for each time point is listed in rows. For classification learning it is also important to have an additional class variable and a time point variable, both including their

35

36

EEG records

(Preprocessing)

Python script

(Classification)

Python script

csv

R script

csv

(Aggregating data and selecting features)

SPSS script

(Creating diagrams for feature selection)

CHAPTER 4. IMPLEMENTATION

Figure 4.5: Classification procedure

4.2. CLASSIFICATION PROCEDURE

Table 4.2: The chosen time intervals are listed in this table for each participant. Boundaries are data points (in milliseconds) in the segment.

Interval

AHE02F

AVM27E

1 2 3 4 5 6 7 8 9

4-16 24-56 124-132 156-192 204-256 260-296 300-324 -

4-28 32-48 52-60 64-84 136-156 180-200 220-244 304-324 340-364

IGN31N IRL26N 4-16 20-36 40-48 64-88 128-160 164-204 224-260 280-320 -

4-24 28-48 52-80 116-156 176-196 200-292 296-328 340-356 -

IZR20F

NME11N

8-20 24-40 44-80 84-104 200-216 220-264 268-296 300-332 348-384

4-36 124-148 212-260 304-352 360-396 -

values for each data point. Creating these variables is also done by the python script and the result is a csv file that is the ingoing data format for creation of the separability index and classification learning.

4.2.2 Feature Selection The feature selection procedure is the same as in the work of Blankertz et al. (2011). As described in section 3.2.2, a visualization with separability measures is created by calculating separability indices for each pair of channel and time point. This job is done by a spreadsheet in Microsoft Excel that uses the csv file as input. The measure that is used here, is signed-r2 -values of the pointwise biserial correlation coefficient (r-value). The resulting separability indices serves as basis for a graphical inspection and the determination of a set of time intervals that are probably good candidates for classification. The intervals are heuristically determined in a way that its spatial pattern is as homogeneous as possible. After that, an average across time is calculated within each interval. Figures 4.6 and 4.7 contain the graphical result of the generation of separability indices for each participant. The depicted diagrams include all determined intervals that are also summarized in Table ??.

37

CHAPTER 4. IMPLEMENTATION

channels

AHE02F 0.4 0.2 0.0 −0.2 −0.4

Pz Cz FCz Fz −100

0

100

200

300

400

500

[ms]

channels

AVM26E 0.3 0.2 0.1 0.0 −0.1 −0.2 −0.3

Pz Cz FCz Fz −100

0

100

200

300

400

500

[ms]

channels

IGN31N 0.3 0.2 0.1 0.0 −0.1 −0.2 −0.3

Pz Cz FCz Fz −100

0

100

200

300

400

500

[ms]

Figure 4.6: Separability indices for participants with id AHE02F, AVM26E and IGN31N.

38

4.2. CLASSIFICATION PROCEDURE

channels

IRL26N 0.3 0.2 0.1 0.0 −0.1 −0.2 −0.3

Pz Cz FCz Fz −100

0

100

200

300

400

500

[ms]

channels

IZR20F 0.3 0.2 0.1 0.0 −0.1 −0.2 −0.3

Pz Cz FCz Fz −100

0

100

200

300

400

500

[ms]

channels

NME11N 0.3 0.2 0.1 0.0 −0.1 −0.2 −0.3

Pz Cz FCz Fz −100

0

100

200

300

400

500

[ms]

Figure 4.7: Separability indices for participants with id IRL26N, IZR20F and NME11N.

39

CHAPTER 4. IMPLEMENTATION

4.2.3 Parameter Optimization A common way in classification learning is to conduct a parameter optimization before training a classification technique with a dataset. This step is also done in this work. To avoid overfitting, a ten-fold cross-validation is used to evaluate parameter combinations. Parameters for Linear Discriminant Analysis In contrast to the implementation of LR in scikit-learn (Version 0.16; Pedregosa et al., 2011), the implementation of LDA (sklearn.lda.LDA) does not allow for many parameter settings. In addition, the aim of this work is to compare a stateof-the-art procedure as it is described by Blankertz et al. (2011) with LR. Since the authors recommended to use shrinkage when having a high number of features and my dataset is high-dimensional, shrinkage is required for classification in this work. The solver can be set to lsqr which means an algorithm using a least squares solution is applied for learning. Another solution would be an eigenvalue decomposition, but this failed in test runs. These two solvers are the only ones that can be used in combination with shrinkage. Thus, there is no need for a parameter optimization and I apply a LDA with following parameter settings to the dataset: • Solver: lsqr • Shrinkage: auto Parameters for Logistic Regression The documentation of scikit-learn (version 0.16) lists following optional parameters for the Python class sklearn.linear_model.LogisticRegression that are relevant for the application of a parameter optimization in my approach: penalty This parameter specifies the norm used in the regularization. This term addresses what cost function is minimized by the LR as an optimization problem. Possible values (string) are l1 or l2. C This attribute features the inverse of regularization strength. Smaller values (float) effect stronger regularization.

40

4.2. CLASSIFICATION PROCEDURE fit_intercept Setting this parameter (boolean) to true has the effect that a constant is added to the decision function as a bias or intercept.

solver This parameter specifies which algorithm is used for the optimization problem of the LR. Possible values are newton-cg, lbfgs, liblinear or sag. The solvers newton-cg and lbfgs support only l2 penalties. liblinear should be used for small datasets whereas sag is a good candidate for large ones. Since I have a small dataset, I only use the liblinear solver during parameter optimization.

class_weight Setting this optional parameter to balanced effects an automatically adjustment of class weights inversely proportional to class frequencies int the input data. Alternatively, a dict can be used to define class weights. If no value is given, all classes get the weight one. Since I have an unbalanced dataset, I set this optional parameter to balanced. For parameter optimization I conducted a grid search in combination with a ten-fold cross validation to find the parameter combination with the best area under the ROC curve value. As described in Section 3.3.2, it is recommended to use area under the curve values for evaluation of imbalanced classification because the accuracy score is insensitive for unbalanced proportions in the dataset and therefore often imprecise. I used the following possible parameter settings for a grid search with LR: • Penalty option: l1, l2 • C value: 0.1, 1, 10, 100, 1000 • Fit_intercept option: True, False • Solver options: liblinear • Class_weight options: balanced, None The results of the conducted grid search are summarized in Table 4.3.

41

CHAPTER 4. IMPLEMENTATION Table 4.3: Best parameter settings for logistic regression resulting from a grid search in combination with a 10-fold cross-validation for each participant Subject ID AHE02F AVM29E IGN31N IRL26N IZR20F NME11N

C

Penalty

Fit_intercept

Solver

Class_weight

0.1 0.1 1 0.1 1 1

l2 l1 l1 l2 l1 l2

True True True False True True

liblinear liblinear liblinear liblinear liblinear liblinear

balanced balanced balanced balanced None balanced

ROC AUC 1.00 0.92 0.87 0.75 0.98 0.85

(+/(+/(+/(+/(+/(+/-

0.00) 0.21) 0.45) 0.37) 0.05) 0.51)

4.3 Evaluation As described in Section 3.3.3, a stratified ten-fold cross-validation with random splits in ten iterations is used to evaluate both classifiers – LDA with shrinkage of the covariance matrix and LR. To measure the performance of the classifiers the accuracy score and the area under the ROC curve are used. After performing the cross-validations for both classifiers, the performance results are statistically compared by conducting a 2 x 10 repeated-measures Analysis of Variance (ANOVA) with the factors classifier (LDA vs. LR) and crossvalidation iteration (10 iterations).

4.3.1 Results for Classification The results of the ERN classification are summarized in Table 4.4. Means of performances of both classification techniques – LDA in combination with shrinkage of the covariance matrix and LR – are listed in this table for each participant and are depicted in Figures 4.8. Averaged accuracy as well as averaged AUC values that result from 10 iterations of a ten-fold cross-validation are shown in Columns 2, 3 and 4, 5, respectively. A comparison of the averaged performances over all participants is depicted in Figure 4.9. For all participants, the LR classifier reached an area under the ROC value of 0.91 and an overall accuracy of 0.92. The performance of the LDA classifier in terms of the AUC value is 0.83 and in terms of overall accuracy 0.91. Figure 4.10 shows the ROC curves for both classifiers and each participant. Descriptively, the performance of the classifier with LR (yellow bars in Figure 4.9) is higher than the performance of the classifier with LDA (red bars in 4.9). For AUC values,

42

4.3. EVALUATION

AUC values LDA with shrinkage Logistic Regression

1.0

AUC

0.8

0.6

0.4

0.2

0.0

AHE02F

AVM27E

IGN31N IRL26N Subject

IZR20F

NME11N

Overall Accuracy LDA with shrinkage Logistic Regression

1.0

Accuracy

0.8

0.6

0.4

0.2

0.0

AHE02F

AVM27E

IGN31N IRL26N Subject

IZR20F

NME11N

Figure 4.8: Performance of LDA with shrinkage of the covariance matrix and LR for each participant. Mean AUC values (top) as well as accuracy values (bottom) from all 10 iterations and its standard error are depicted for each participant.

43

CHAPTER 4. IMPLEMENTATION

Table 4.4: Results of classification of LDA with shrinkage of the covariance matrix and LR that summarize averaged performance metrics AUC and accuracy from 10 iterations of a stratified ten-fold cross-validation.

Subject ID

AUC

Accuracy

AHE02F AVM27E IGN31N IRL26N IZR20F NME11N

LDA 0.97 0.73 0.85 0.72 0.84 0.88

LR 0.99 0.90 0.89 0.80 0.95 0.90

LDA 0.94 0.81 0.96 0.83 0.92 0.95

LR 0.97 0.88 0.94 0.82 0.94 0.94

Mean (SE)

0.83 (0.04)

0.91 (0.03)

0.90 (0.03)

0.92 (0.02)

Classifier comparison LDA with shrinkage LR

1.0

AUC value/Accuracy

0.8

0.6

0.4

0.2

0.0

AUC

Performance metric

Accuracy

Figure 4.9: Comparison of averaged performance of all participants for LDA with shrinkage of the covariance matrix and LR. The AUC values and accuracy values are shown on the left and right side, respectively.

44

4.3. EVALUATION this observation is confirmed by a 2 x 10 repeated measures ANOVA with the factors classifier and iteration, which yielded a significant main effect of classifier, F (1, 5) = 10.178, p < .05, ηp2 = .67. There is no significant main effect of iteration, F (1, 5) = .811, p = .61, ηp2 = .14 as well as no significant interaction of classifier and iteration, F (1, 5) = 1.099, p = .38, ηp2 = .18. For overall accuracy, the performance difference is not confirmed by the a 2 x 10 repeated measures ANOVA with the factors classifier and iteration, F (1, 5) = .939, p = .38, ηp2 = .16. Additionally, there are no differences between iterations, F (1, 5) = .889, p = .54, ηp2 = .15 as well as no interaction of classifier and iteration, F (1, 5) = 1.138, p = .36, ηp2 = .19.

45

CHAPTER 4. IMPLEMENTATION

ROC LDA with shrinkage 1.0

True Positive Rate

0.8

0.6

0.4

ahe02f / AUC = 0.98 avm29e / AUC = 0.74 ign31n / AUC = 0.85 irl26n / AUC = 0.72 izr20f / AUC = 0.85 nme11n / AUC = 0.88

0.2

0.0 0.0

0.2

0.4 0.6 False Positive Rate

0.8

1.0

ROC Logistic Regression 1.0

True Positive Rate

0.8

0.6

0.4

ahe02f / AUC = 0.99 avm29e / AUC = 0.90 ign31n / AUC = 0.89 irl26n / AUC = 0.80 izr20f / AUC = 0.95 nme11n / AUC = 0.91

0.2

0.0 0.0

0.2

0.4 0.6 False Positive Rate

0.8

1.0

Figure 4.10: ROC curves of LDA with shrinkage of the covariance matrix and LR for each participant. The plot shows the ROC curve of each participant for both classifiers. It is an averaged curve built from 10 iterations with a stratified ten-fold cross-validation with random splits. The area under the ROC curve (AUC) value for each participant is shown in the legend.

46

5 Conclusion and Future Work In this work, a study was conducted to compare linear discriminant analysis (LDA), a state-of-the-art method in the domain of ERP component classification and logistic regression (LR) which is a method that is far more unknown in this domain. In particular, both techniques were used for the classification of an error-related potential (ErrPs), namely the error-related negativity (ERN). EEG data of ErrPs is typically characterized by sparsity and imbalance, and in general, there are very few trials of erroneous responses in relation to trials of correct responses. So, when classifying ErrPs, one has to face the class imbalance problem. Another problem are the assumptions that have to be met when applying LDA. Although, LDA is known to be quite robust against non-fulfilled assumptions, there is evidence that non-normally distributed classes and non-equal covariance matrices of the classes can lead to worse classification performance. There is also evidence that the application of LR which is a similar technique without assumptions is more suitable in case of datasets that do not fulfill the strict requirements of LDA. Furthermore, the application of LR is simple and fast, and seems to be an attractive alternative for classifying ErrPs. In a direct comparison of both techniques applied to ErrP data from an Eriksen Flanker task, LR reaches a better performance than LDA with shrinkage of the covariance matrix when taking into account the receiver operating characteristic (ROC). When comparing overall accuracies, both techniques reach the same performance. Nevertheless, these results indicate an advantage of LR since ROC is preferably used for imbalanced classes and the interpretation of overall accuracies is problematic when having imbalanced classes. The worse performance of LDA can possibly be explained by the dataset that may not meet the assumptions for the application of LDA. An inspection of the EEG data regarding the fulfillment of the assumptions would give clearer evidence to this point. Other questions also remain unanswered and are worth to be considered in future. Even though LDA fails to meet the expectations, it seems to perform quite

47

CHAPTER 5. CONCLUSION AND FUTURE WORK well when applying it on the dataset used in this work. The classification performance of LDA would certainly be improved when classes were more balanced or rather when having more representatives for incorrect responses. More representatives for incorrect responses could lead to a better estimate of the real feature distribution and more precise covariance matrices. This leads to the question of how many cases would be needed to reach better or equal performances with LDA in comparison with LR. A possible study could comprise a dataset with balanced classes whose balance is stepwise altered and used for the classification with LDA and LR. This would allow to observe whether there is a cut-off point where LDA performs similar as LR or even better. Overall, LR seems to be a suitable alternative for LDA in case of the classification of ErrPs or ERP components in general when training data is sparse and imbalanced. Due to strong similarities between both classification techniques, LR is highly comparable to LDA with respect to its simplicity and quickness of application.

48

6 References Batista, G. E. A. P. A., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 6, 20–29. Blankertz, B., Lemm, S., Treder, M. S., Haufe, S., & Müller, K.-R. (2011). Singletrial analysis and classification of ERP components — A tutorialf. NeuroImage, 56, 814–825. Borkenau, P. (2008). NEO-FFI: NEO-Fünf-Faktoren-Inventar nach Costa und McCrae (2nd ed.). Göttingen: Hogrefe. Botvinick, M., Nystrom, L. E., Fissell, K., Carter, C. S., & Cohen, J. D. (1999). Conflict monitoring versus selection-for-action in anterior cingulate cortex. Nature, 402, 179–181. Buttfield, A., Ferrez, P. W., & Millán, J. d. R. (2006). Towards a robust BCI: error potentials and online learning. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 14, 164–168. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. Chawla, N. V., Japkowicz, N., & Kotcz, A. (2004). Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter, 6, 1–6. Cureton, E. E. (1956). Rank-biserial correlation. Psychometrika, 21, 287–290. Dal Seno, B., Matteucci, M., & Mainardi, L. (2010). Online detection of p300 and error potentials in a BCI speller. Computational Intelligence and Neuroscience, 2010. Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification (2nd ed.). New York: Wiley.

49

CHAPTER 6. REFERENCES Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation. Journal of the American Statistical Association, 78, 316– 331. Efron, B. & Tibshirani, R. J. (1993). An intoduction to the bootstrap. London: Chapman & Hall. Eriksen, B. A. & Eriksen, C. W. (1974). Effects of noise letters upon the identification of a target letter in a nonsearch task. Perception & Psychophysics, 16, 143–149. Estabrooks, A., Jo, T., & Japkowicz, N. (2004). A multiple resampling method for learning from imbalanced data sets. Computational Intelligence, 20, 18– 36. Falkenstein, M., Hohnsbein, J., Hoormann, J., & Blanke, L. (1990). Effects of errors in choice reaction tasks on the ERP under focused and divided attention. Psychophysiological brain research, 1, 192–195. Farwell, L. A. & Donchin, E. (1988). Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. Electroencephalography and Clinical Neurophysiology, 70, 510–523. Ferrez, P. W. & Millán, J. d. R. (2005). You are wrong!-Automatic detection of interaction errors from brain waves. 19th International Joint Conference on Artificial Intelligence, 1413–1418. Ferrez, P. W. & Millán, J. d. R. (2008). Error-related EEG potentials generated during simulated brain-computer interaction. IEEE transactions on biomedical engineering, 55, 923–929. Friedman, J. H. (1989). Regularized discriminant analysis. Journal of the American Statistical Association, 84, 165. Gehring, W. J., Coles, M. G. H., Meyer, D. E., & Donchin, E. (1990). The errorrelated negativity: an event-related brain potential accompanying errors. Psychophysiology, 27, S34. Green, M. D. & Swets, J. A. (1966). Signal detection theory and psychophysics. Huntingtion, NY: Krieger. He, H. & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21, 1263–1284. Holroyd, C. B. & Coles, M. G. H. (2002). The neural basis of human error processing: reinforcement learning, dopamine, and the error-related negativity. Psychological review, 109, 679–709.

50

Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Ijcai, 14, 1137–1145. Kubat, M. & Matwin, S. (1997). Addressing the curse of imbalanced training sets: one-sided selection. ICML, 97. Lotte, F., Congedo, M., Lecuyer, A., Lamarche, F., & Arnaldi, B. (2007). A review of classification algorithms for EEG-based brain-computer interfaces. Journal of neural engineering, 4, R1–R13. Luck, S. J. (2014). An introduction to the event-related potential technique (2nd ed.). Cambrigde: The MIT Press. Lugauer, J. M. (2015). Influences of rule violation and rule exchange on the response behavior and electrophysiological correlates (Master thesis, OttoFriedrich-Universität, Bamberg). Luu, P., Tucker, D. M., Derryberry, D., Reed, M., & Poulsen, C. (2003). Electrophysiological Responses to Errors and Feedback in the Process of Action Regulation. Psychological Science, 14, 47–53. Otte, M. R. (2015). Electrophysiological correlates of error commission in relation to personality (Master thesis, Otto-Friedrich-Universität, Bamberg). Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., . . . Duchesnay, É. (2011). Scikit-learn: machine learning in python. The Journal of Machine Learning Research, 12, 2825–2830. Pohar, M., Blas, M., & Turk, S. (2004). Comparison of logistic regression and linear discriminant analysis. Metodoloki zvezki, 1, 143–161. R Core Team. (2015). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Raudys, S. J. & Jain, A. K. (1991). Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Transactions on Pattern Analysis & Machine Intelligence, 252–264. Riesel, A., Endrass, T., Kaufmann, C., & Kathmann, N. (2011). Overactive error-related brain activity as a candidate endophenotype for obsessivecompulsive disorder: evidence from unaffected first-degree relatives. The American journal of psychiatry, 168, 317–324. Schalk, G., Wolpaw, J. R., McFarland, D. J., & Pfurtscheller, G. (2000). EEGbased communication: presence of an error potential. Clinical Neurophysiology, 111, 2138–2144.

51

CHAPTER 6. REFERENCES Schmidt, N. M., Blankertz, B., & Treder, M. S. (2012). Online detection of errorrelated potentials boosts the performance of mental typewriters. BMC Neuroscience, 13, 19 Spüler, M. & Niethammer, C. (2015). Error-related potentials during continuous feedback: using EEG to detect errors of different type and severity. Frontiers in human neuroscience, 9, 155. Journal Article. Student. (1908). The probable error of a mean. Biometrika, 6, 1–25. Treder, M. S. & Blankertz, B. (2010). (C)overt attention and visual speller design in an ERP-based brain-computer interface. Behavioral and brain functions, 6. van Schie, H. T., Mars, R. B., Coles, M. G. H., & Bekkering, H. (2004). Modulation of activity in medial frontal and motor cortices during error observation. Nature neuroscience, 7, 549–554. Xie, J. & Qiu, Z. (2007). The effect of imbalanced data sets on LDA: a theoretical and empirical analysis. Pattern Recognition, 40, 557–562. Xue, J.-H. & Titterington, D. M. (2008). Do unbalanced data have a negative effect on LDA? Pattern Recognition, 41, 1558–1571. Yang, W.-H., Dai, D.-Q., & Ya, H. (2008). Feature extraction and uncorrelated discriminant analysis for high-dimensional data. IEEE Transactions on Knowledge and Data Engineering, 20, 601–614. Zou, K. H., O’Malley, A. J., & Mauri, L. (2007). Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models. Circulation, 115, 654–657.

52

7 Appendix 7.1 Performance Results

53

CHAPTER 7. APPENDIX

Table 7.1: Accuracy and AUC values of LDA in combination with shrinkage and LR for each participant and each iteration of a ten-fold cross-validation.

Participant Iteration

LDA_AUC

LDA_Accuracy

LR_AUC

LR_Accuracy

irl26n irl26n irl26n irl26n irl26n irl26n irl26n irl26n irl26n irl26n ahe02f ahe02f ahe02f ahe02f ahe02f ahe02f ahe02f ahe02f ahe02f ahe02f izr20f izr20f izr20f izr20f izr20f izr20f izr20f izr20f izr20f izr20f

0.70859 0.74697 0.72593 0.73704 0.71111 0.77205 0.71582 0.68939 0.69115 0.69192 0.96870 0.98689 0.97300 0.98675 0.98277 0.92582 0.98624 0.98285 0.97929 0.97071 0.85152 0.89520 0.86676 0.85379 0.82393 0.81409 0.77684 0.87904 0.82571 0.84600

0.82800 0.85679 0.80702 0.83885 0.85552 0.81806 0.83545 0.82836 0.83545 0.82339 0.93500 0.93504 0.93500 0.94797 0.94435 0.91251 0.93866 0.95909 0.94478 0.93864 0.91596 0.91752 0.91655 0.90329 0.92128 0.93055 0.92523 0.91728 0.92582 0.92979

0.80960 0.82340 0.79495 0.80017 0.75960 0.83098 0.79377 0.81364 0.80387 0.79158 0.99293 0.99293 0.99293 0.99293 0.99495 0.97071 0.99091 0.99091 0.99091 0.99091 0.96566 0.96212 0.95455 0.95960 0.93788 0.94545 0.96111 0.94141 0.94343 0.94646

0.81722 0.82277 0.81839 0.82307 0.82153 0.84006 0.80862 0.85029 0.82895 0.81751 0.96318 0.96727 0.97273 0.97251 0.97706 0.95433 0.97615 0.97636 0.97160 0.98115 0.95155 0.93095 0.92525 0.93925 0.94284 0.94287 0.95590 0.93946 0.93925 0.93872

54

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

7.1. PERFORMANCE RESULTS

Participant Iteration

LDA_AUC

LDA_Accuracy

LR_AUC

LR_Accuracy

ign31n ign31n ign31n ign31n ign31n ign31n ign31n ign31n ign31n ign31n avm29e avm29e avm29e avm29e avm29e avm29e avm29e avm29e avm29e avm29e nme11n nme11n nme11n nme11n nme11n nme11n nme11n nme11n nme11n nme11n

0.80637 0.82940 0.83923 0.85246 0.86395 0.81908 0.87720 0.89652 0.83944 0.87584 0.70744 0.72747 0.75162 0.78982 0.71284 0.71511 0.76998 0.74560 0.72105 0.75898 0.88030 0.86768 0.91818 0.88131 0.83636 0.84949 0.90606 0.83384 0.91111 0.87172

0.97117 0.96617 0.95206 0.96706 0.96615 0.96617 0.96208 0.95732 0.96162 0.96186 0.83740 0.80725 0.79350 0.83694 0.82490 0.80527 0.81272 0.80137 0.79007 0.79314 0.94190 0.95188 0.96117 0.95186 0.94167 0.95255 0.96119 0.94117 0.94688 0.95617

0.85556 0.92576 0.92222 0.84444 0.92475 0.86869 0.88434 0.89141 0.90657 0.88131 0.91481 0.90387 0.89916 0.89545 0.90303 0.91010 0.90539 0.91128 0.90185 0.87121 0.90612 0.87831 0.93586 0.88650 0.85742 0.89400 0.90404 0.87735 0.94192 0.94747

0.93394 0.95184 0.93734 0.93660 0.93186 0.93325 0.93301 0.96141 0.94799 0.94710 0.89370 0.87532 0.88110 0.89949 0.87574 0.88745 0.87574 0.88819 0.88194 0.86319 0.95188 0.93779 0.93258 0.94712 0.92714 0.93303 0.94119 0.93641 0.93688 0.94617

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

55

CHAPTER 7. APPENDIX

7.2 Media Contents A compact disc containing the following contents is attached to this thesis: • Raw EEG data of Eriksen Flanker task • Python script for preparing EEG data • Microsoft Excel spreadsheets for creating the separability indices • R script for creating diagrams of separability indices • SPSS script for feature selection • Data used for classification learning and testing • Python scripts for parameter optimization and classification learning • Performance results • A digital version of this thesis

56

Eidesstattliche Erklärung Hiermit versichere ich, die vorliegende Abschlussarbeit selbstständig und nur unter Verwendung der von mir angegebenen Quellen und Hilfsmittel verfasst zu haben. Sowohl inhaltlich als auch wörtlich entnommene Inhalte wurden als solche kenntlich gemacht. Die Arbeit hat in dieser oder vergleichbarer Form noch keinem anderem Prüfungsgremium vorgelegen.

Datum:

Unterschrift:

57