WEB ALGORITHM SEARCH ENGINE BASED NETWORK MODELING OF MALARIA TRANSMISSION

WEB ALGORITHM SEARCH ENGINE BASED NETWORK MODELING OF MALARIA TRANSMISSION EZE, MONDAY OKPOTO A thesis Submitted In fulfillment of the requirement f...
Author: Rosanna Maxwell
0 downloads 0 Views 3MB Size
WEB ALGORITHM SEARCH ENGINE BASED NETWORK MODELING OF MALARIA TRANSMISSION

EZE, MONDAY OKPOTO

A thesis Submitted In fulfillment of the requirement for the degree of Doctor of Philosophy (PhD) in Computer Science.

Faculty of Computer Science and Information Technology UNIVERSITI MALAYSIA, SARAWAK 2013

ACKNOWLEDGEMENTS I thank God who granted me the needed strength throughout the period of this doctorate research. My supervisors Assoc. Prof. Dr Jane Labadin and Terrin Lim deserve my appreciations for their supports and guidance in making this work a huge success. I also acknowledge the supports of the Dean of the Faculty of Computer Science and Information Technology, UNIMAS, Prof. Dr. Narayanan Kulathuramaiyer for creating the needed atmosphere for research and learning in the faculty. My appreciation further goes to the Center for Graduate Studies and the leadership of UNIMAS for granting me financial sponsorship through the Zamalah Postgraduate Scholarship program. Similarly, I acknowledge with gratitude, the Ministry of Higher Education Malaysia, for supporting this work through the Fundamental Research Grant scheme FRGS/2/10/SG/UNIMAS/02/04. My international conferences were financed by this research grant. Finally, I thank my wife, children, parents and in-laws for their patience, understanding and moral supports throughout these three years of being away from home as a result of my postgraduate studies.

ii

DEDICATION To my dear wife Ifeoma Faith Eze (Mrs), and all my children for their patience, prayers and supports all through these years of being away from my home country to pursue my doctorate degree. To my parents, Chief and Mrs Patrick Eze Okpoto for training me. To my inlaws, Chief and Chief (Mrs) Obikpe for all their cares.

iii

ABSTRACT Malaria has been described as one of the most dangerous and widest spread tropical diseases, with an estimated 247 million cases around the globe in the year 2006 alone. This calls for urgent scientific interventions. Since malaria is a vector borne disease, this research tackled the issue of malaria transmission from the angle of vector detection through a search engine. There are observed cases of attempting vector control on a trial and errors basis, with no scientific way of determining the locations of critical vector densities. Unfortunately, such a practice leads to waste of resources on the wrong places, while ignoring the areas of critical vector existence. This research formalizes a contact network using a number of attributes of the malaria vectors, the public places, and the human beings that affect malaria transmission. The resulting structure is a heterogeneous bipartite contact network of two node types - the public places and the human beings nodes. The human beings are those who have suffered from malaria, even when their residential homes were under reliable vector control. Such an exclusion principle makes it obvious that these people, most probably contacted the disease from outside their residential homes. The Hypertext Induced Topical Search (HITS) web search algorithm was adapted to implement a search engine, which uses the bipartite contact network as the input. MATLAB was used to implement the model system. The output shows the public places which habour the infected malaria vectors, and their corresponding vector densities. The model output was validated with UCINET 6.0 as the benchmark system. A root mean square error (RMSE) value of 0.0023 was obtained when the output of the benchmark system is compared with that of the search engine model. This result indicates a high and acceptable level of accuracy.

iv

ABSTRAK Malaria merupakan salah satu penyakit tropika yang paling merbahaya dan luas tersebar, dengan anggaran 247 juta kes di seluruh dunia pada tahun 2006 sahaja. Keadaan ini memerlukan intervensi saintifik yang mendesak. Memandangkan malaria ialah penyakit yang disebabkan oleh vektor, kajian ini cuba menangani isu penyebaran malaria melalui pengesanan vektor menggunakan carian enjin. Terdapat kes-kes yang cuba mengawal vektor tanpa menggunakan kaedah saintifik dalam menentukan kawasan kepadatan vektor kritikal. Namun, kaedah tersebut membawa kepada pembaziran sumber pada kawasan yang salah, di samping mengabaikan kewujudan kawasan-kawasan vektor kritikal. Kajian ini membina rangkaian hubungan yang menggunakan beberapa ciri-ciri vektor malaria, tempat awam, dan manusia yang mempengaruhi penyebaran penyakit malaria. Struktur yang dihasilkan adalah rangkaian hubungan dwibahagian berheterogen yang terdiri daripada dua jenis nod - tempattempat awam dan manusia. Manusia masih menjadi mangsa jangkitan malaria walaupun kediaman mereka dilindungi menggunakan kawalan vector yang bagus. Berdasarkan prinsip pengecualian yang dinyatakan, jelas menunjukkan bahawa kemungkinan besar, mangsa dijangkiti penyakit ini di luar kawasan kediaman mereka. Algoritma carian web Hypertext Induced Topical Search (HITS) telah digunapakai untuk melaksanakan enjin carian yang menggunakan rangkaian hubungan dwibahagian sebagai input. MATLAB digunakan untuk melaksanakan sistem model. Hasilnya, model ini menunjukkan tempat-tempat umum yang mempunyai vektor malaria yang dijangkiti, serta dengan kepadatan vektornya. Model output itu telah disahkan dengan menggunakan UCINET 6.0 sebagai sistem penanda aras. Nilai Root Mean Square Error (RMSE) sebanyak 0.0023 terhasil apabila output sistem penanda aras ini dibandingkan dengan model carian enjin. Keputusan ini menunjukkan tahap kejituan yang tinggi dan boleh diterimapakai.

v

LIST OF PUBLICATIONS/ RESEARCH PRESENTATIONS Eze, M., Labadin, J., Lim, T. (2010, May 12-13). Role of Computational Science In Malaria Research. In the Proceedings/ Book of Abstracts of Young ICT Researchers Colloquium 2010, FCSIT UNIMAS, p40

Eze, M., Labadin, J., Lim, T. (2011a, Mar 20-22). Emerging Computational Strategy for Eradication of Malaria. The Proceedings of 2011 IEEE Symposium on Computers & Informatics (IEEE /ISCI 2011), Kuala Lumpur, p715-720.

Eze, M., Labadin, J., Lim, T. (2011b, June 17-19). Mosquito Flight Model and Applications in Malaria Control. In Proc. of 3rd International Conference on Computer Engineering and Technology (ICCET 2011), Kuala Lumpur , pg 59-64.

Eze, M., Labadin, J., Lim, T. (2011c, July 18-22). The Binary Tree-Based Heterogeneous Network Link Model for Malaria Research. In Proceedings of 7th International Congress for Industrial and Applied Mathematics Conf. (ICIAM 2011), Vancouver Canada, p546-547.

Eze, M., Labadin, J., Lim, T. (2011d, July 11-14). Contact Strength Generating Algorithm for Application in Malaria Transmission Network. In Proceedings of 7th International Conference on IT in Asia (CITA 2011), Kuching, Sarawak, Malaysia, p21-26.

Eze, M., Labadin, J., Lim, T. (2012, January). Structural Convergence of WebGraph, Social Network & Malaria Network: An Analytical Framework for Emerging Web-Hybrid Search Engine. Accepted for 2nd Review by the International Journal of Web Eng. & Tech. (IJWET).

vi

Short Publications/Research Summaries Presented Eze, M., Labadin, J., Lim, T. (2011). Network Modeling of Malaria Transmission. Being a Research Summary Published in the FCSIT 2011 Research Bulletin.

Eze, M., Labadin, J., Lim, T. (2012, Feb 29). Network Modeling of Malaria Transmission. Being a Research Summary presented in the FCSIT UNIMAS Open Day 2012.

Eze, M., Labadin, J., Lim, T. (2012, March 21-22). Network-based Modeling of the Transmission of Mosquito-Borne Disease. Being a Research Poster presented in the 5th UNIMAS Research EXPO 2012.

Eze, M., Labadin, J., Lim, T. (2012). Network Modeling of Vector-Borne Diseases. Being a research summary presented in A research Forum between Sarawak Health Department and Computational Sciences Department, FCSIT UNIMAS on May 22, 2012.

Labadin, J., Lim,T. & Eze, M (June 2012). Network Modelling of Malaria Transmission, UNIMAS Research Update 2012, Vol. 8, No.1, pg. 11

vii

TABLE OF CONTENTS TITLE PAGE

......................................................................................................

i

ACKNOWLEDGEMENTS ……………………………………………………………

ii

……………………………………………………………………

iii

ABSTRACT ………….……..………………………………………………………….

iv

………………..…………………………………………………………..

v

DEDICATION

ABSTRAK

LIST OF PUBLICATIONS/RESEARCH PRESENTATIONS

……………………. vi

TABLE OF CONTENTS

……………………………………………………………

LIST OF APPENDICES

……………………………………………………………. xiii

LIST OF TABLES

……………………………………………………………………

viii

xiii

LIST OF FIGURES ……………………………………………………………………. xiv LIST OF EQUATIONS

……………………………………………………………

xvi

LIST OF ABBREVIATIONS

……………………………………………………. xvii

CHAPTER 1: INTRODUCTION

…………………………………………………

………………………………………………………………..

1.0

OPENING

1.1

BACKGROUND OF STUDY

1.2

RESEARCH PROBLEMS

1 1

…………………………………………

2

…………………………………………………

5

1.3

RESEARCH QUESTIONS …………………………………………………

7

1.4

OBJECTIVES OF STUDY

……………………………………………….…

8

1.5

SCOPE OF STUDY …………………………………………………………

9

1.6

SIGNIFICANCE OF STUDY

………………………………………....

10

1.7

RESEARCH METHODOLOGY

……………………………………….…

11

1.8

THESIS OUTLINE

……………………………………………………….…

14

viii

…………………………………………..

15

………………………………………………………….

15

CHAPTER 2: LITERATURE SURVEY 2.0

INTRODUCTION

2.1

WHAT IS MALARIA?

………………………………………………….

15

2.1.1

Malaria Lifecycle

………………………………………………….

17

2.1.2

Malaria Lifecycle and Contact Networks

………………………….

18

…………………

19

Malaria Transmission Factors

…………………………………

20

2.2.1a Demographic Factors

…………………………………

21

2.2.1b Human and Socioeconomic Factors …………………………

21

…………………………

22

…………………

22

…………………

23

2.2

COMPUTATIONAL EPIDEMIOLOGY OF MALARIA 2.2.1

2.2.1c Biological and Clinical Factors

2.2.1d Topological and Environmental Factors 2.3

PUBLIC PLACES IN DISEASE TRANSMISSION

2.4

FROM GRAPH THEORY TO NETWORKS

2.5

BIPARTITE NETWORKS

…………………………………………………

28

2.6

CONTACT NETWORKS

………………………………………………….

29

2.7

MOSQUITO BEHAVIOUR IN CONTACT NETWORKS

………….

32

2.8

CONTACT STRENGTH DETERMINING FACTORS

…………………

35

2.8.1

…………………

37

Important Deductions and the way forward

…………………………..

…………………………..

39

…………………..

43

…..

46

2.9

STRUCTURAL SIMILARITY RESEARCH

2.10

BACKGROUND STUDY OF HITS ALGORITHM

2.11

WEB SEARCH ENGINE APPLICATIONS IN NON-WEB FIELDS

2.12

CRITICAL APPRAISAL OF EXISTING METHODOLOGIES

2.13

CHAPTER SUMMARY

……………

…………………………………………………..

ix

25

47 49

CHAPTER 3: CONTACT NETWORK MODEL FORMALIZATION ………… ……………………………………………………………

51

……………

51

……………………………

52

3.0

INTRODUCTION

3.1

CONTACT NETWORK STRUCTURAL REPRESENTATION

3.2

3.3

51

3.1.1

Contact Network Structural Definitions

3.1.2

Malaria Contact Network Structural Problem

……………………

55

3.1.3

Contact Network Construction in Real Life

……………………

56

MALARIA VECTOR ACTIVITY MODELS

………………………….

59

3.2.1

Malaria Life Cycle Duration Model

…………………………..

60

3.2.2

Malaria Vector Biting Model

……………………………………

61

3.2.3

Malaria Vector Abundance Model

……………………………………

62

3.2.4

Malaria Vector Survival Model

……………………………………

62

3.2.5

Larval Count Estimation Model

……………………………………. 63

……………………………………………

PUBLIC PLACES MODELS

64

3.3.1

Expected Number of Annual Working Days Model ……………………. 65

3.3.2

Actual Number of Annual Working Days Model

……………………. 66

……………………………………. 66

3.4

HUMAN BEINGS PARAMETERS

3.5

CONTACT NETWORK PARAMETER ASSIGNMENTS

……………. 67

3.6

THE CONTACT STRENGTH MODEL CALCULATIONS

……………. 74

3.7

THE CONTACT STRENGTH NORMALIZATION

3.8

CHAPTER SUMMARY

…………………….. 76

……………………………………………………

77

……

79

……………………………………………………………

79

……………………………………

80

CHAPTER 4: SYSTEM DESIGN, IMPLEMENTATION AND RESULTS 4.0

INTRODUCTION

4.1

SEARCH ALGORITHM FEATURES

x

4.2

4.1.1

Weight Matrix Generation by HITS Algorithm in the Web – PS1 ……

81

4.1.2

Weight Matrix Generation by HITS in Malaria Network- PS2

……

83

……………………………………………………………

83

……………………

84

……………………………………………

85

……………………………………

86

4.2.1c Search and Indexing Section ……………………………………

88

……………………………

91

……………………

92

SYSTEM DESIGN 4.2.1

SEARCH ENGINE WORKFLOW DESIGN 4.2.1a Inputs Section

4.2.1b Transformation Section

4.2.1d Interpretation of Search Result 4.2.2

EXTENDED CONTRIBUTIONS SECTION

…………………………..

92

4.2.2b Contact Network Evaluation Engine …………………………..

94

4.2.2c Malaria Indirect Transfer Analysis

…………………………..

96

4.2.2d Pyramidal Visualization System

………………………….

99

4.2.2a Contact Network Crowd Analysis

………………………………………….

102

………………………………………….

102

4.3.1

SYSTEM IMPLEMENTATION ENVIRONMENT …………………..

103

4.3.2

SYSTEM OPTIMIZATION STRATEGIES

………………….

103

4.2.3 4.3

System Output Section

SYSTEM IMPLEMENTATION

…...

104

4.3.2b Speed Improvement Benefit ……………………………………

106

……………………………………

106

……

107

……………………………………………………

107

4.3.2a Storage Space Saving through Sparse Matrix Application

4.3.2c Fault Avoidance Strategy 4.3.3 4.4

IMPLEMENTATION LIMITATIONS AND CHALLENGES

CHAPTER SUMMARY

xi

……………………………………………

109

……………………………………………………………

109

……………………………………

110

……………………………………………

113

CHAPTER 5: MODEL VALIDATION 5.0

INTRODUCTION

5.1

MODEL VALIDATION FRAMEWORK

5.2

BENCHMARK VALIDATION 5.2.1

Benchmark Validation Platform

……………………………………

113

5.2.2

Benchmark Validation Workflow

……………………………………

115

5.2.2a Benchmark Validation Tasks 1 (Loading Data into System) ……. 116 5.2.2b Benchmark Validation Tasks 2 (System Runs and Output) ……. 116 5.2.2c Benchmark Validation Tasks 3 (Error Analysis)

……………. 120

5.2.2d Benchmark Validation Task 4 (Interpretation of Result) 5.3

5.4

ANALYTICAL VALIDATION

……. 122

……………………………………………. 122 ……………………………………

122

……

124

Contact Strength and Vector Density Correlation Analysis ……………

126

……………………………………

128

5.3.1

Opening Time analysis of MCPP

5.3.2

Network Crowding and Vector Density Correlation Analysis

5.3.3

VALIDATION RESULT DISCUSSION 5.4.1

Discussion on Benchmark Validation

……………………………

128

5.4.2

Discussion on Analytical Validations

……………………………

128

………………………….

131

CHAPTER 6: SUMMARY AND CONCLUSION 6.0

INTRODUCTION

…………………………………………………………..

131

6.1

SUMMARY OF CURRENT RESEARCH ………………………………….

131

6.2

MAIN CONTRIBUTIONS

………………………………………………….

133

6.3

FUTURE RESEARCH

…………………………………………………..

136

6.3.1

Wide Area Malaria Vector Density Mapping Project

xii

……………

136

6.3.2

Vector-borne Disease Flow Path Modeling ……………………………

136

6.3.3

The Wind, Flood and Malaria Vectors in Contact Networks ……………

137

6.3.4

Partitioned Version of the Contact Network ……………………………

137

6.4

DEPLOYMENT INFORMATION ………………………………………….

137

6.5

CONCLUSION

………………………………………………………….

138

………………………………………………………………….

139

REFERENCES

LIST OF APPENDICES …………………………….. 165

APPENDIX

MP:

Extended Implementation Section

APPENDIX

TT:

Tables of Implementation Related-Data

APPENDIX

FF:

Implementation Flow Charts ……………………………………... 189

APPENDIX

IO:

Implementation Outputs

APPENDIX

TR:

Implementation Test Run Messages (Minimal Listing)

APPENDIX

SC:

Implementation Source Code

................................... 175

…………………………………….. 215 …….. 226

…………………………….. 229

LIST OF TABLES (These are the tables in the main text. The other tables in the appendices are not listed here) Table 3.01A: Feasibility Research Summary Table ……………………………………

58

……………………………………………

68

Table 3.01C: Link Matrix (Column 21-40) ……………………………………………

69

……………………………………

71

Table 4.03T: Crowd Matrix ……………………………………………………………

93

……………………………………………………

97

Table 3.01B: Link Matrix (Columns 1-20)

Table 3.01D: Bin2dec and Dec2bin Conversions

Table 4.04T: Sample Link Matrix

Table 4.05T: Sparse Matrix ……………………………………………………………. 97

xiii

……………………………………………………. 108

Table 4.09T: Summary of results

Table 5.02T: RMSE Analysis Detailed Calculation Table

……………………………. 121

Table 5.03T: Vector Density vs Crowd Correlation Calculation Table

……………. 125

Table 5.04T: Vector Density vs Contact Strength Correlation Table

…………….. 127

LIST OF FIGURES (These are the figures in the main text. The other figures in the appendices are not listed here) ……………………………

12

……………………………………………

16

Fig. 2.02F: Developmental Life Cycle of Malaria

……………………………………

18

Fig. 2.03F: Contact Building Block Diagram

……………………………………

19

……………………………………………

20

Fig. 2.05F: Average Global Household Size ……………………………………………

24

……………………………………………………

25

……………………………………

26

……………

27

……

28

……………………………………

30

……………………………………………………

32

……………………………………

34

……………

35

Fig. 1.01F: Minimal Flowchart of the Methodology Fig. 2.01F: Literature Survey Domains

Fig. 2.04F: Epidemiological Triangle

Fig. 2.06F: A Sample Graph

Fig. 2.07F: Graph vis-à-vis Network Structures

Fig. 2.08F: A Sample Network Modeled from Simple Graph Structures

Fig. 2.09F: Minimal Algorithm on progression from Graph to Network Model Fig. 2.10F: A Sample 3P by 5H Contact Network Fig. 2.11F: Single Node Network

Fig. 2.12F: Mosquito Behavioural Model Facts

Fig. 2.13F: Measure of Level of Contacts in Disease Related Researches

Fig. 2.14F: Measure of Level of Contacts in Non-Disease Related Researches Fig. 2.15F: A Sample contact network with arbitrary edge weights Fig. 2.16F: Demonstration of Link and Weight Matrices

xiv

……. 36

……………. 38

……………………………. 38

Fig. 2.17F: Illustrations of Web Graph and Social Network ……………………………. 40 Fig. 2.18F: Transformation into Adjacency Matrix ……………………………………. 42 Fig. 2.19F: Illustration of Dynamic and Static Network

……………………………. 44 …………………..

46

Fig. 2.21F: Classification of Malaria vector species …………………………………..

48

Fig. 3.01F: Model Formalization Coverage Areas

…………………………………..

52

Fig. 3.02F: A Sample Contact Network Diagram

…………………………………..

54

…………..

57

Fig. 3.04F: Contact Strength Model Block Diagram

…………………………..

67

Fig. 3.05F: Partial Sketch of the Model Contact Network

……………………………

77

……………………

80

Fig. 4.02F: Search Engine Comparative Features Diagram ……………………………

82

……………………………………………

84

……………………………………………………

85

……………

87

……………………

90

……………………………

91

Fig. 2.20F: Non-web fields with web algorithm search engines

Fig. 3.03F: Public Places used for Vector Existence Feasibility Research

Fig. 4.01F: System Design and Implementation Coverage Areas

Fig. 4.03F: System Design Framework Fig. 4.06F: System Workflow

Fig. 4.07F: Structural Attributes of the Hub and Authority Matrices Fig. 4.08F: Sketch of the Implementation Iteration Steps Fig. 4.09F: Result derived from Indexing Operation Fig. 4.10F: Crowd Analysis Workflow

……………………………………………. 92

Fig. 4.11F: Public Places Crowd Graph

……………………………………………. 93

Fig. 4.12F: Indirect Transfer Analysis Workflow Design

……………………………. 97

Fig. 4.13F: Similarity measures related to the most critical public place P2 ……………. 98 Fig. 4.14F: Pyramidal Visualization Workflow

……………………………………. 99

Fig. 4.15F: Sparse Matrix Transformation into three Linear Vectors

……………. 100

Fig. 4.16F: Sparse Matrix Transformation into three Linear Vectors

……………. 101

xv

Fig. 4.23F: Space saving through use of sparse matrices

……………………………. 104

……………………………………

110

……………………………………………

112

……………………

115

Fig. 5.10F: Benchmark Ranking Result (B-Result) ……………………………………

117

Fig. 5.11F: Result of Sorting the M-Result and B-Result Datasets ……………………

119

Fig. 5.01F: Model Validation Coverage Areas Fig. 5.02F: System Validation Framework

Fig. 5.03F: Benchmark Validation Broken into 4 Specific Tasks

Fig. 5.12F: Details of the Calibration Operations

……………………………………. 119

Fig. 5.13F: Error analysis datasets (B-Result and M-Result) …………………………….. 120 Fig. 5.14F: MCPP Open Time Analysis Result

…………………………………….. 123

LIST OF EQUATIONS ……………………………………

41

……………………………

41

……………………

45

Equation (2.4)

Authority Matrix Transformation Equation ……………………

45

Equation (3.1)

Contact Network Components Set Equation ……………………

53

Equation (3.2)

Polynomial Fit Temperature Normalization Equation

……

61

Equation (3.3)

Malaria Life Cycle Duration Model Equation

Equation (3.4)

Malaria Vector Biting Model Equation

……………………

61

Equation (3.5)

Malaria Vector Abundance Model Equation ……………………

62

Equation (3.6)

Polynomial Fit Elevation Normalization Equation

……………

62

Equation (3.7)

Malaria Vector Survival Model Equation

……………………

63

Equation (3.8)

Larval Count Estimation Model Equation

……………………

64

Equation (3.9)

Expected Number of Annual Working Days Model ……………

65

Equation (2.1)

Definition of Link Matrix

Equation (2.2)

Definition of Adjacency Matrix

Equation (2.3)

Hub Matrix Transformation Equation

xvi

……………. 61

Equation (3.10)

Actual Number of Annual Working Days Model

……………

66

Equation (3.11)

Contact Strength Calculation Model ……………………………

74

Equation (3.12)

Expanded Contact Strength Calculation Model

……………

75

Equation (3.13)

Contact Strength Normalization Equation

……………………

76

Equation (4.1)

Contact Strength to Hub Derivation Equation

……………

86

Equation (4.2)

Contact Strength to Authority Hub Derivation Equation

……

86

Equation (4.3)

Power Method Eigen Equation

Equation (4.4)

EigenVector Estimation Iteration Equation ……………………

89

Equation (4.5)

Contact Strength Max2Max Ratio Evaluation Equation

……

94

Equation (4.6)

Crowd Max2Max Ratio Evaluation ……………………………

95

Equation (4.7)

Jaccard Similarity Coefficient

……………………………

96

Equation (4.8-13)

Space Savings Benefit Calculation Equations

Equation (5.1)

RMSE Definition Equation

Equation (5.2)

Network Crowding Vs Vector Density Correlation Equation …... 124

Equation (5.3)

Contact Strength and Vector Density Correlation Equation ……

……………………………. 88

……………

105

……………………………………

121

LIST OF ABBREVIATIONS ABBREVATION

MEANING

#EWD

Expected Annual Number of Working Days

#YEARMIN

Actual Annual Working Days (designated by #YEARMIN)

AUTH

Authority Matrix

CnetSimVer1.0

Contact Network Simulation System Version 1.0

CO2

Carbon dioxide

FlowSNxxx

Stands for ‘Flow Chart Code xxx’

xvii

126

HITS

Hypertext Induced Topical Search.

MATLAB

Matrix Laboratory

SCodeSNxxx

Source Code Serial number

OUTPUT.LOGx

Log generated by UCINET during system runs.

Pidx

Public Place Index Column

PPCTx

Public Place Close Time for a given public place Px.

PPDTx

Public Place Time Duration for a given public place Px.

PPOTx

Public Place Open Time for any public place Px.

PPSim

Public Places Similarity

Pscore

Public Place Rank Score

RMSE

Root Mean Square Error

SIM

Similarity Measure Function

UNIMAS

Universiti Malaysia, Sarawak

xP by wH

Contact network of x public places and w human nodes (x,w:integers).

xviii

CHAPTER ONE INTRODUCTION

1.0

OPENING The domain of this research is in Computational Modeling, and the specific area is in

the modeling of vector-borne disease transmission using contact network models, in particular in view of detecting locations of possibly high density of infected mosquitoes. In order to do this, the research problem needs to be properly constructed into network models where the nodes and edges are clearly defined for the purposes of the said detection. The problem solving steps are evolved, designed and then implemented using an appropriate computer programming platform. The network structure is transformed into an appropriate format in order to use it as the input into the model. The model is run, validated, and a number of analytical results generated. The motivation for this study is as follows. The conventional way of disease modeling, commonly known as compartmental modeling, is by constructing differential equations. Unfortunately, such models lack the support to detect locations of possibly high density of infected mosquitoes. Locating the public places that harbour malaria vectors is very important in the eradication of malaria. This is because, without this, vector control efforts will be wasted on areas of less importance. Hence, the main contribution of this research is to demonstrate a new approach to model vector-borne disease transmission.

1.1

BACKGROUND OF STUDY Malaria is a vector-borne disease that results from blood infection by protozoan

parasites of the genus Plasmodium, which are transmitted from one human being to another

1

by female Anopheles mosquitoes (Richard & Kamini, 2002). The four species of malaria parasites that infect humans are Plasmodium falciparum, Plasmodium vivax, Plasmodium malariae and Plasmodium ovale. Malaria is one of the most dangerous and widest spread tropical diseases, according to Global Risk Forum (2009). As reported by WHO (2008), there were an estimated 247 million malaria cases worldwide in 2006, causing nearly a million deaths, mostly of children under 5 years. It has been stated that about 3.3 billion people (about half of the world's population) are at risk of malaria. Every year, this leads to about 250 million malaria cases and nearly one million deaths. People living in the poorest countries are the most vulnerable (WHO, 2010). A research mentioned malaria as one of the root causes of poverty (Malaria Consortium, 2010). It has been estimated that malaria cuts economic growth rates by as much as 1.3% in countries with high disease rates (UNDP/World Bank/WHO, 2003). A child dies of malaria every 30 seconds (National Institute of Allergy and Infectious Disease NIAID, 2010). Historical survey shows that malaria has existed for centuries, and several eradication efforts have failed to make the desired impacts. Cox (2010) also recounted that for about 2500 years, there was an erroneous belief that malaria resulted from polluted air rising from swamps. Kelly-Hope & McKenzie (2009) had cited malaria as the most serious mosquito-borne disease. The discovery of malaria parasite (Pettersson, 2005) and malaria vector (Feachem et al., 2009) in the 19th century are important milestones in malaria research, thus promoting “scientific precision” over “trial and errors”. The Bill & Melinda Gates Foundation (2009) pointed out that research towards eradication of malaria is an unavoidable venture. The goal of this research is to tackle the issue of malaria transmission through vector detection. This is through application of a search engine on a contact network with the aim of detecting the public places that harbour malaria vectors, and ranking such public places in terms of their vector densities. Public places

2

(example markets, schools and others) are chosen due to the fact that they accommodate higher population size of human beings than average residential homes. Since disease spread increases with increase in population size, transmission is expected to also increase and affect more human beings through the public places. Vector control itself may not be successful without reliable scientific tools that detect locations for urgent vector control attention. A conventional approach to study the disease transmission is through compartmental modeling, which employs a system of differential equations (Ladeau et al., 2011). The SIR Model (Dimitrov & Meyers, 2010), which breaks the population into three compartments susceptible, infected and recovered - is in this category. Unfortunately, the compartmental modeling approach is based on some unrealistic assumptions, one of which is the concept of homogeneous mixing. This is the assumption that all individuals in a transmission environment have equal chances of mixing with others, and hence have uniform probability of contacting the disease. Network modeling is an improvement over compartmental approach in the sense that it has the ability to depict the complexity of the real world (Craft & Caillaud, 2011) by capturing the interactions that lead to disease transmission. Hence rather than simply assuming that all individuals have equal chances of contacting the disease, this model’s approach takes note of the fact that contacts always vary, and that the probability of disease transmission is proportional to the level of such contacts. Contact network modeling is rooted in graph theory. Before defining contact networks, it is important to first define graphs and networks, and point out the relationship between the two concepts. A graph is a mathematical structure made up of a set of points called nodes that are connected by lines called edges. A network is a graph where the nodes and edges have been assigned meaningful values. The word meaningful in this sense implies that the resulting structure automatically becomes associated

3

with a particular field of study. For instance, a road network is a graph structure where the nodes represent different cities, the edges represent the roads connecting the cities, and the edge labels represent the actual distance of the roads. Hence, a graph is a mathematical model of networks. From the angle of the object oriented paradigm, a network is simply an instantiation of a graph object. A contact network is a graph structure where each node represents a person (or location), and the edges represent contacts among people (or locations) in the network (Meyers, 2007). In infectious disease epidemiology, a contact network depicts interactions that can lead to disease transmission. Human infectious diseases get transmitted as a result of human contacts either with other infected human beings, locations, or non-human infectious agents, depending on the disease in question. For instance, lung infections are contacted by being in locations with particulate air pollution (Fullerton et al., 2008). Malaria transmission takes places when a human is bitten by infected vectors. The human contact in this case takes place within a location where these infected vectors thrive. A contact network is therefore a structure that model a disease transmission environment as a set of nodes and edges, such that the disease transmits from one node to another through the edges (Salathe & Jones, 2010). In contact networks, the higher the edge weight (measure of level of possible contact), the higher the probability of transmission between adjacent nodes (Schumm et al., 2007). A contact network can be categorized as either homogeneous (single node) or heterogeneous. A single node contact network is a network in which all the nodes are of the same type, while a heterogeneous network is one where the nodes are of different types, and hence of different behavioural attributes. A contagious skin disease such as small pox can be modeled using a single node network since transfer is directly from person to person, unlike in malaria where vectors are involved. The complexity is minimal when modeling such a

4

disease since only a single node type is involved in the disease transmission environment. This is different in the case of malaria transmission, which requires a heterogeneous contact network where the two node types ‘public places’ and ‘human beings’ have a number of dissimilar attributes. For instance, while human beings usually move about, public places are usually stationary. Furthermore, since malaria vectors have mobility (they can fly), their attributes have to be factored into the model, thereby making heterogeneous contact network modeling more complex. The research problem here is therefore about the necessity to detect and rank the public places that account for the infection of the human beings. These are the reservoirs for infected malaria vectors.

1.2

RESEARCH PROBLEMS Malaria transmission in public places is a problem that needs scientific intervention.

An article by Rogers (2009) reported that a number of public places, such as bars and restaurants, had closed outdoor terraces or shut down completely because of what was described as a “100 billion mosquito invasion”. Unfortunately, there is a research gap that needs to be filled in terms of the detection of public places that harbour these vectors. A practical scenario was observed in 2010 when a team of vector control experts visited UNIMAS to spray the institution against malaria and dengue vectors. In an interview, they mentioned that they lacked vector detection tools, which resulted in the team possibly spraying in the wrong places. A number of tools in existence for vector detection have some associated disadvantages. One such technology is the laser-wielding robot that detects mosquitoes in the air and shoots them dead (Robert, 2009). Two serious concerns expressed from public opinion

5

are that the lasers could be harmful to human beings, and that the technology could mistakenly kill other insects that are useful in the ecosystem. Vector detection is an issue that calls for research. Disease modeling methods that assume population homogeneity have been described as faulty and unrealistic (Tom & Gerardo, 2009). The term population homogeneity refers to the assumption that every person within the disease transmission environment has equal probability of mixing with others and hence getting infected by the disease. An improvement over this faulty strategy is to build models that emphasize variation of contacts leading to disease transmission. Network modeling is a method that takes into consideration the variation of contacts in disease transmission, which is the method proposed in this research as a way to address the observed deficiency. The difficulty in modeling of malaria transmission arises due to its complex life cycle. Fortunately, every malaria transmission involves contacts (blood sucking bites) between human beings and the vectors. While this scenario could be used to build contact networks for malaria transmission studies, there are some important issues to be dealt with, one of which is the fact that public places and human beings have different attributes. This would mean that a heterogeneous network rather than a single-node network would be more appropriate for studying malaria transmission in public places. However, Christakis & Fowler (2009) stated that the complexity associated with heterogeneous networks generation has impeded many network researches, which is an issue that needs to be addressed. Given that a contact network model is involved, a search engine would be appropriate to be used to detect the public places of interest, which is another research gap to be filled. For this purpose, we will propose a web search engine algorithm on the contact network for vector reservoir detection. To the best of our knowledge, no previous research has applied

6

Suggest Documents