South African National Bioinformatics Institute

South African National Bioinformatics Institute Research Institute at the University of the Western Cape since 1997 South African Medical Research ...
11 downloads 2 Views 7MB Size
South African National Bioinformatics Institute

Research Institute at the University of the Western Cape since 1997

South African Medical Research Council Bioinformatics Capacity Development Unit since 2002

World Health Organisation Tropical Disease Research regional training center since 2004

Department of Science and Technology National Research Foundation Research Chair in Bioinformatics and Public Health Genomics since 2007

Contents Policy Mandates

2

Vision, Mission, Goals

3

Director’s Message

4

2011 Overview

5

Staff

7

Capacity Development Undergraduate Training Programme Postgraduate Training Programme SANBI Graduations Conferences, Workshops and Courses organised by SANBI Internships Conference Participation

9 9 10 12 13 16 17

Computational Resources

19

Awards and Honours

20

SANBI in the Media

21

Community Engagement

22

Research Outputs Summary Journal Publications Chapters in Books Software and Similar Outputs Developed Keynote and Invited plenary Conference Presentations or Posters Theses Expert Panel or Professional Membership Policy Briefs Intervention Programmes

23 23 23 24 24 25 25 27 28 28 28

Research Projects Overview

29

Research Labs Alan Christoffels Junaid Gamieldien Gordon Harkins Nicki Tiffin Simon Travers

30 30 33 34 37 39

Research Collaborations

41

Financials

51

End of Year Party

53

Alumni

54

2

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

POLICY MANDATES National Strategic Plan for HIV/AIDS, STIs and TB, (2012 - 2016) The vision and mission of SANBI align with Draft Zero of the National Strategic Plan (NSP). This draft specifies Research and Innovation as a key enabler of the NSP, and proposes that “relevant research provides information and the impetus for innovation within the implementation of the NSP”, and that strategic priorities should include “concrete plans to improve capacity for Research” and “a budget for research”. The Department of Science and Technology’s Ten Year Innovation Plan (2008 - 2018) One of the five Grand Challenge areas specified in this Plan is the “Farmer to Pharma” value chain to strengthen the bioeconomy. SANBI’s genomics programme, which straddles both communicable and non-communicable diseases, aligns clearly with this Grand Challenge. The MRC Act (Act 58 of 1991) As an extramural unit of the MRC, SANBI falls under the legislative and other mandates of the MRC. In Section 3, this Act states that the Legislative Mandate of the MRC is: "Through research, development and technology transfer, to promote the improvement of the health and quality of life of the population of the Republic, and to perform such functions as may be assigned to the MRC or under this Act."

A n n u a l R e p o rt 2 0 1 1

Vision To become a centre of global, African and South African excellence, achieving the highest levels in biomedical research and education.

Mission To conduct cutting edge bioinformatics and computational biology research relevant to South African, African and global populations. To develop human resources in bioinformatics and computational biology by educating and mentoring scientists. To increase awareness of and access to bioinformatics and computational biology resources.

Goals To generate and publish high quality, relevant biomedical research. To train and graduate competent and productive researchers. To add value to the academic program of the University of the Western Cape. To enhance other research fields through collaborative projects. To establish sources of renewable funding to pursue the mission of the institute.

3

4

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Director's Message It is with much excitement that I can reflect on 2011 in the context of bioinformatics development both locally and across the African region. The front cover of our annual report each year captures a significant event pivotal to our yearly activities. The large cohort of MSc and PhD graduates (6) in 2011 has been a proud moment for the institute as we renew our commitment to develop bioinformatics capacity in South Africa and the African continent. Our recent graduates have either continued with PhD studies or have taken up faculty positions locally and abroad. During 2011 we co-hosted the African Society for Bioinformatics and Computational Biology conference in Cape Town and we hosted 98 African scientists for a 2-day workshop at our training facility on the University of the Western Cape campus. These activities will continue to expand in 2012 through our mandate as the MRC bioinformatics capacity development unit especially with the increased need for expertise in next generation sequencing analysis.

Our staff recruitment exercise that was completed in 2010 has cemented our diverse research portfolio and arguably places us

Our staff recruitment exercise that was completed in 2010 has cemented our diverse research portfolio and arguably places us as the forerunners in bioinformatics research on the African continent. We have retained a funding stream from both local (87%) and international donors (13%) and acknowledge the generous support from the Department of Science and Technology/National Research Foundation Research Chair programme and the South African Medical Research Council in a climate where global research funding has declined. The future certainly does not look bleak with the announcement of the National Institutes of Health and Wellcome Trust’s commitment to support research on the African continent through the Human, Heredity and Health (H3Africa) programme.

as the forerunners in

The University of the Western Cape has contributed generously to the strategic acquisition of research equipment at SANBI and has supported our purchase of a high performance Dell machine (88 CPU and 512 Gb of memory). This hardware african continent. is being integrated into our existing infrastructure and will also support other computing-intensive research on campus such as the Astronomy Research Chair Programme. Our technical staff has made impressive progress with the development of an in-house CLOUD solution and virtual environment that facilitates distributed computing with ease. We believe that these solutions can be of value to other scientific computing groups locally and abroad.

bioinformatics research on the

I congratulate all our staff and students on their impressive strides in seeking biomedical solutions to diseases plaguing our society and contributing to a vibrant research culture at SANBI and on the university campus.

Professor Alan Christoffels DST/NRF Research Chair in Bioinformatics and Health Genomics

A n n u a l R e p o rt 2 0 1 1

2011 Overview Highlights: • • • • • • • • •

1 MSc and 5 PhD graduates. 5 Students attended international conferences presenting research while 4 students enjoyed internships internationally (3) and nationally (1). 12 high-impact publications and 8 keynote addresses. Unit Director, Alan Christoffels, was elected to the Academy of Science of South Africa. Two international training workshops were held and attended by 75 participants from multiple African countries. Three training workshops held for students and researchers in South Africa. Organised the African Society for Bioinformatics and Computational Biology Society meeting that was hosted in Cape Town. Acquisition of a 512 Gig memory high performance Dell machine to expand our national mandate of providing bioinformatics research support. 6 unique public resources for the biomedical community contributed by SANBI.

Research objectives over the last 12 months: • • • • • • •

To provide relevant bioinformatics services to HIV, Hepatitis C, tuberculosis and sleeping sickness researchers. To provide bioinformatics research capacity development and training in the key health domains of HIV, TB, the control of Tsetse fly (Glossina morsitans) and other disease vectors. To develop and provide analytical algorithms to discover genes that contribute to the development of complex diseases. To develop and provide a sequence database and genome annotation tools to describe the molecular epidemiology and drug resistance profile of HIV. To develop a software solution for user-friendly HIV drug resistance testing using the 454 sequencing platform. To develop a software solution for identifying nucleotide variation in Mycobacterium tuberculosis genomes. To build a research program to identify and characterise biologically relevant secondary structures in nucleotide alignments of pathogenic RNA viruses.

Progress achieved: The unit has delivered 12 health-related publications containing significant discoveries in 2011. These include articles in high impact venues such as publications in Nature Genetics (1), Journal of Virology (1), Aids Research (1) and one book chapter on disease gene prioritisation. The publications contribute to a total journal impact factor of 76.211. Unit researchers have been invited to present 8 invited talks nationally, in Africa and internationally. Over the past year, 96% of the 27 postgraduates qualify as historically disadvantaged and 37% are female. Of the 7 postdoctoral fellows, 86% are historically disadvantaged and 43% are female. As a leading HIV research team, we have contributed to the understanding of HIV CXCR4-usage during disease progression. In Malawi, we have investigated and published the prevalence of drug resistance in a treatment-naïve population of HIV infected individuals. We showed that drug resistant mutations in patients prior to treatment had no effect on treatment outcome.

Over the past year, 96% of the 27 postgraduates qualify as historically disadvantaged and 37% are female.

In South Africa we investigated the evolution of HIV-1 subtype C viruses in the female genital tract relative to the blood plasma through a longitudinal study of the sequence diversity in HIV-1 infected patients during acute and chronic infection. Our virology researchers are also investigating the degree to which RNA viral evolution is constrained by

5

6

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

2011 Overview cont. secondary structure in the families Caliciviridae and Picornaviridae (one of the most genetically diverse of the positive-sense single-stranded RNA viral families and the most common cause of infections in humans in developed countries). This work has led to the development of a viral RNA secondary structure prediction algorithm. Through participation in international efforts such as the International Glossina Genome Initiative (IGGI), SYSCO consortium and the HIV Dynamics Meeting, the Unit has maintained strong international links, and has benefited from capacity development and skills development. In particular, we have organised the international HIV Dynamics conference held in Ireland. Internationally, SANBI is synonymous with the leading African bioinformatics effort. The unit has continued its contribution to the WHO-sponsored analysis of the recently sequenced and assembled Glossina morsitans genome and we continue to host a sleeping sickness portal for the researchers on the African continent. During 2011, the unit has contributed significantly to a proposal to the NIH for a Pan African Bioinformatics Network that would support the H3Africa (www.h3africa.org) funded genetics projects. SANBI will serve as a centre of excellence to support the genetics researchers in Southern Africa. SANBI has also partnered with clinicians and geneticists in pursuing genomic approaches to understanding diseases under this initiative. We have tested the utility of our internationally competitive semantic database to clarify the often unclear or unapparent links between novel mutations recently reported in the literature and the diseases or phenotypes being investigated, and have found that we can often provide better insights than the original publication. Our system often uncovers the underlying biological mechanisms that lead to the development of phenotypes associated with a disease. An active collaboration with the TB centre of excellence at Stellenbosch Medical School has led to the development of methodology to analyse large volumes of sequencing data for Mycobacterium tuberculosis and the identification of virulence associated mutations. Through an international effort led by the SANBI we have modeled four novel TB drug targets that are being subjected to docking studies. This work provided the impetus for a PhD student within the unit to spend 2-weeks in a structural biology laboratory in Spain. SANBI’s expertise in next generation sequencing methodologies has led to the development of an exome sequencing and knowledge-discovery pipeline to identify the genetic cause(s) of disease in a patient with Multiple Sclerosis. This protocol allows for the identification of rare mutations associated with phenotypes that would normally be missed. We are developing a novel concept to prioritise mutations by mining the database for genes associated with ‘surrogate’ or ‘secondary’ phenotypes linked to disease (e.g. demyelination in multiple sclerosis). Our algorithm development effort has achieved significant breakthrough with the development of a statistical method to identify driver genes in the development and metastasis of breast cancer. Our method combines and re-analyses large numbers of data sets that may have been generated on different technology platforms as a means to increase the statistical power of the meta-analysis, while weakening the effects of individual study-specific biases.

A n n u a l R e p o rt 2 0 1 1

7

Staff Our academic staff comprises 5 principal investigators supervising a total of 27 postgraduate students. Our computational research environment is maintained by a dedicated systems administrator, software developer and a database administrator. Administrative support to both students and staff is provided by 4.5 admin staff. The Director position at SANBI is currently being filled by an interim Director, Alan Christoffels who holds the DST/NRF Research Chair in Bioinformatics and Public Health Genomics. Applicants are requested to contact Professor Christoffels for further information.

SANBI 2011 staff:

Our academic staff comprises 5 principal investigators supervising a total of 27 postgraduate students.

Junaid Gamieldien Alan Christoffels Samantha Alexander Senior Lecturer DST/NRF Research Chair in Assistant to the Bioinformatics and Health Genomics DST/NRF Interim Director: SANBI Research Chair

Dale Gibbs Systems Administrator

Gordon Harkins Senior Lecturer

Mario Jonas Database Administrator

Fungiwe Mpithi Receptionist

Ferial Mullins Finance Administrator

Maryam Salie Student Administrator

Nicki Tiffin Senior Lecturer

Simon Travers Associate Professor

Peter van Heusden Senior Systems Administrator

Junita Williams HR Administrator

Vladimir Bajic External Professor

Winston Hide External Professor

8

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Staff cont. Staff development Continuous staff development is encouraged at SANBI and during the past year staff have developed skills to enhance their work performance through workshops and formal degree studies. Conferences/workshops attended by staff: Staff Member Conference/Workshop

Benefit

Peter van Heusden

This conference gave us an invaluable opportunity to network Galaxy Community Conference 2011, Lunteren, with others working to make bioinformatics workflows more accessible to life scientists. The speakers presented the latest Netherlands. developments in the Galaxy platform as well as examples of how they adapted Galaxy to meet the needs of their local software environment. Insights gleaned from the talks and from face to face networking with conference participants have informed IT and bioinformatics workflow planning at SANBI.

Nicki Tiffin

Workshop on Empowering Genomics in Southern Africa Application to Infectious Disease, Limpopo, June 2011.

Learning new technologies for genomics research for infectious diseases in Africa. Co-hosted by the J Craig Venter Institute and the University of Limpopo.

Ferial Mullins

2011 Global Finance Conference, Dubai, November 2011.

The conference aim was to bring together finance managers of the private and public sectors and together we worked through strategies on how to share funding or financing between the sectors eg. how the private sector can become donors to the public sector and vice-versa.

Enrolled staff: Staff member

Degree

Institution

Enrolled or graduated

Graduation month

Peter van Heusden

Honours Part-time, Information Technology

University of South Africa

Enrolled

December 2012

Strategic Planning Session Annual strategic review and planning for the following year occurs during November/December of an academic calendar year. In 2011, staff at SANBI held its planning session from 1-2 December 2011 at Feathers Lodge, Durbanville. This breakaway session provided an opportunity to build inter-personal relationships and streamline processes within the institute to deliver on our mandate of providing national bioinformatics training and internationally competitive research.

SANBI Staff at the 2011 Strategic Planning Session

A n n u a l R e p o rt 2 0 1 1

Capacity Development Undergraduate Training Programme During 2011 Thapelo Mohotsi, a computer science BSc graduate was recruited for 12 months as an intern working on the development of a web resource to capture and compare genetic profiles of 100 individuals. This project has cemented an exciting collaboration with the Forensics Laboratory at the University of the Western Cape.

Third year Biotechnology module SANBI lecturers taught an introductory bioinformatics course BTN323 to 53 third-year Biotechnology students. 2011 SANBI Postgraduate Registration: Student

Gender Nationality

Degree

Year since first enrolment

Saleema Crous

F

South Africa

MSc

1

Fredrick Nindo

M

Kenya

MSc

1

Wisdom Akurugu

M

Ghana

MSc

1

Emil Tanov

M

South Africa

MSc

1

Darlington Mapiye

M

Zimbabwean

MSc

1

Saleem Adam

M

South Africa

MSc

4

Mmakamohelo Direko

F

South Africa

MSc

2

Firdous Khan

F

South Africa

MSc

2

James Matthews

M

South Africa

MSc

2

Oreetseng Moncho

F

South Africa

MSc

2

Ram Shrestha

M

Nepal

PhD

1

Mahjoubeh Jalali

F

South Africa

PhD

1

Ibrahim Ahmed

M

Sudan

PhD

1

Emad Fadhal

M

Sudan

PhD

1

Mushal Ali

M

Sudan

PhD

4

Ruben Cloete

M

South Africa

PhD

3

Musa Gabere

M

Kenya

PhD

4

Zahra Jalali

F

South Africa

PhD

2

Samuel Kwofie

M

Ghana

PhD

4

Mbandi Kimbung

M

Cameroon

PhD

2

Monique Maqungo

F

South Africa

PhD

4

Sarah Mwangi

F

Kenya

PhD

2

Edwin Murungi

M

Kenya

PhD

4

Alecia Naidu

F

South Africa

PhD

2

Kavisha Ramdayal

F

South Africa

PhD

2

Mark Wamalwa

M

Kenya

PhD

4

Adugna Woldesemayat

M

Ethiopia

PhD

2

Sumir Panji

M

Kenya

PostDoc

2

Samson Muyanga

M

South Africa

PostDoc

2

Oupa Tsotetsi

M

South Africa

PostDoc

2

Barbara Picone

F

Italy

PostDoc

1

Uljana Hesse

F

Germany

PostDoc

1

Gordon Jamieson

M

Scotland

PostDoc

1

Natasha Wood

F

South Africa

PostDoc

1

9

10

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Capacity Development cont. Postgraduate Training Programme

96% of the 27

Over the past year, 96% of the 27 postgraduates trained through SANBI qualify as historically disadvantaged and 37% were female. Of the 7 postdoctoral fellows, 86% represent historically disadvantaged students and females represent nearly 50% of all postdocs.

postgraduates trained through SANBI qualify as historically disadvantaged and 37% were female.

SANBI ANNUAL REPORT 2011 Natasha Wood

F

South Africa

PostDoc

1

[group student pic avail] 2011 SANBI Students

Distribution of postgraduate student registrations for the period 2001 Ð 2011:

Distribution of postgraduate student registrations for the period 2001 – 2011:

10 students registered for MSc Country Total

Males

Females

A n n u a l R e p o rt 2 0 1 1

Capacity Development cont. 10 students registered for MSc: Country

Total

Males

Females

South Africa

7

3

4

Kenya

1

1

Ghana

1

1

Zimbabwe

1

1

17 students registered for PhD: Country

Total

Males

Females

South Africa

6

2

4

Nepal

1

1

Sudan

3

3

Kenya

4

3

Ghana

1

1

Cameroon

1

1

Ethiopia

1

1

1

7 registered Postdoctoral fellows: Country

Total

Males

Females

South Africa

3

2

1

Kenya

1

1

Italy

1

1

Germany

1

1

Scotland

1

1

Some of the UWC Science PhD Cohort, September 2011 Graduation

11

12

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Capacity Development cont.

Musa Gabere

Samuel Kwofie

Monique Maqungo

Conor Meehan

Saleem Adam

Mark Wamalwa

SANBI graduations during 2011: Name

Degree

Thesis

Musa Gabere

PhD, UWC

Prediction of antimicrobial peptides using hyperparameter optimised support vector machines.

Samuel Kwofie

PhD, UWC

Development of a Hepatitis C virus knowledgebase with computational prediction of functional hypothesis of therapeutic relevance.

Monique Maqungo

PhD, UWC

Prostate cancer knowledgebase with functional genomic data analysis.

Conor Meehan

PhD, National University of Ireland, Galway

Understanding the interaction between HIV-1 and chemokine receptors during host cell entry with focus on the potential for resistance to CCR5-antagonists.

Mark Wamalwa

PhD, UWC

Development of a comprehensive annotation and curation framework for analysis of Glossina morsitans morsitans expressed sequence tags.

Saleem Adam

MSc, UWC

A knowledge base of stress response gene-regulatory elements in Arabidopsis Thaliana.

A n n u a l R e p o rt 2 0 1 1

13

Capacity Development cont. Conferences, workshops and courses organised by SANBI 2011 provided an exciting year of co-organising at least 3 international conferences as outlined below. Workshops to support local researchers and students are listed in an accompanying table.

ISCB Africa ASBCB Conference, 09 – 11 Mar 2011

SANBI staff co-organised

the ISCB ASBCB SANBI staff co-organised the ISCB ASBCB Bioinformatics Conference in Bioinformatics Conference Africa which was held at the Cape Town International Convention in Africa which was Centre and attended by 180 held at the Cape Town scientists from around the world. Of these, 80% were from 13 African International Convention countries. The conference was preceded by two days of workshops Centre and attended by with three parallel sessions. These 180 scientists from around were hosted at SANBI and attended by over 90 participants. We were the world. involved in the initiation of the ASBCB mentorship programme with SANBI faculty members volunteering to act as mentors for African students requiring guidance in their research projects.

The conference shared a session with the Joint International Conference of the African and Southern African Societies of Human Genetics that was held back-to-back with the bioinformatics conference, facilitating interactions and discussion between attendees from both conferences. SANBI director Alan Christoffels chaired the Scientific Committee for the conference, and also chaired the first session on functional genomics. Topics covered during the bioinformatics conference included African genomics, bioinformatics analysis for human genetics, host and pathogen systems biology, database and tool development, molecular epidemiology and evolution, search and design of vaccines and drugs, functional genomics and comparative genomics. Many international speakers gave talks on their research and SANBI was well represented in the oral presentations with Simon Travers presenting his work on characterising HIV resistance to treatment with CCR5 antagonists; Ruben Cloete presenting his work on in-silico TB drug design using comparative genome analysis of DS, MDR and XDR isolates from KZN; Samuel Kwofie presenting his research on inferring enriched biological information from graphs composed of text-derived biomedical concepts of ontologies related to Hepatitis C Virus; and Gordon Harkins presenting his work on the spread of tomato yellow leaf curl virus from the Middle East to the world.

Africa – India Joint Virtual Conference 2011, 10 – 11 Feb 2011 Two SANBI postgrad students, Stanley Mbandi Kimbung and Kavisha Ramdayal, successfully organised the South African hub of the Africa-India Joint Bioinformatics Virtual Conference 2011 (Bifx11). They served as Chair and Technical Chair. This meeting followed the successes of Bifx09 and Bifx10. This joint conference which saw the participation of hubs in India and Africa was organised by the Regional Student Groups of Central and Southern Africa, Bioinformatics Organisation, and supported by Bioclues.org and African Society of Bioinformatics and Computational Biology. The hub in South Africa was hosted at SANBI for which 30 participants drawn from various institutions in South Africa were in attendance. A major highlight were presentations from two of SANBI's researchers: Junaid Gamieldien and Simon Travers entitled “Semantic Integration of Biomedical Knowledge and Existing Data to Support In-Silico Discovery” and “A tale of two pathways: HIV resistance to treatment with CCR5 antagonists” respectively.

Stanley Mbandi Kimbung and Kavisha Ramdayal, successfully organised the South African hub of the Africa-India Joint Bioinformatics Virtual Conference 2011.

14

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Capacity Development cont. 18th International HIV Dynamics and Evolution Conference, 01 – 04 May 2011 The HIV Dynamics and Evolution Conference is the premier international conference for researchers that explore HIV evolution and diversity. In 2011 Prof Travers was the co-host of the conference which was held in Galway, Ireland. The conference was attended by almost 140 delegates with session topics including molecular epidemiology, HIV diversity and phylodynamics, vaccines and drug design as well as ultradeep sequencing – methods and approaches. On the second night of the conference there was a seminar sponsored by SANBI that was given by Edwin J. Bernard who spoke about HIV and the criminal law: combating stigma through science.

An exciting followup has been discussed between SANBI staff and Roche to host next generation sequencing analysis workshops in the following year.

NICD 454 Sequencing Platform Launch, 09 Nov 2011 Prof Travers was an invited speaker at a workshop sponsored by Roche at the National Institute for Communicable Diseases (NICD) for the launch of their new 454 Sequencing Platform. He spoke about his work using 454 sequencing to study the viral populations of individuals infected with HIV. In particular he focused on the bioinformatics issues of such work and presented solutions that are currently under development at SANBI. This talk was well received and led to stimulating discussion during the workshop and has resulted in the establishment of further potential collaborations that are currently being explored. An exciting follow up has been discussed between SANBI staff and Roche to host next generation sequencing analysis workshops in the following year.

Workshops and courses hosted or presented by SANBI: Course

Nature and Purpose

Target audience

Benefit

National Bioinformatics Workshop

A 7-week bioinformatics course that is held 5 days a week from 9am till 5pm. This course aims to give postgrad students an introduction to a range of bioinformatics topics.

Postgraduate students in South Africa.

Students develop in-depth knowledge of key topics that informs their postgraduate thesis design.

Introduction to databases/ gene expression analysis and interaction pathways

Course offered in collaboration with the European Bioinformatics Institute. The material is designed to teach skills in genetic data analysis.

The combined theoretical and 47 Biomedical researchers on the practical sessions gives participants African continent ‘real’ data to explore. including South Africa.

Population genetics

A course designed to give researchers, in particular genetics researchers a combination of theoretical and practical insights into population genetics analyses.

Understanding of popular analyses 22 Biomedical researchers on the such as genome wide association African continent studies.

Africa-India Joint Virtual Conference 2011 (Bifx11)

A virtual conference with 6 speakers, a tutorial and group discussion to encourage engagement between students and researchers to study interactions of pathogens, hosts and vectors in relevant diseases.

140 attendees from across the globe, including 30 participants drawn from various institutions in South Africa

To foster virtual interactions and collaborations among students, as well as researchers, of Africa and India and help to further the advancement of science there.

A n n u a l R e p o rt 2 0 1 1

Capacity Development cont.

Course

Nature and Purpose

Target audience

Benefit

ENSEMBL

To train biologists how to access Students and researchers in genome variation information South Africa and gene content data for various species. The last day was devoted to accessing the genome data via automated scripts.

University of Mauritius Masters Course

Lectures in population genetics and disease, genotype data, genome-wide association studies, computational disease-gene prioritisation approaches, modes of gene dysregulation and functional predictions for single nucleotide polymorphisms in disease.

Students studying for a Masters in Bioinformatics degree at the University of Mauritius.

EBIOKIT

EBIOKIT is a standalone server that contains the necessary bioinformatics tools needed by researchers. The course aims to teach researchers how to maintain a local copy of this KIT and as a consequence, provide bioinformatics support.

Research laboratories can have 17 Biomedical researchers on the their in-house bioinformatics African continent toolkits without the dependence on internet resources.

A plethora of precomputed genomics data has been generated internationally and access to this information would inform new experiments and allow medical researchers with tools to investigate their disease of interest.

The students attended lectures, and were also exposed to handson approaches through tutorials on all the subjects.

15

16

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Capacity Development cont. Internships for SANBI students The postgraduate training experience at SANBI includes opportunities to engage with overseas institutions to develop specific skill sets needed for a project. During 2011, 4 students visited various international and South African laboratories.

International internships: Ruben Cloete 10 – 21 January 2011 Centro de Investigacion Principe Felipe (CIPF), Valencia, Spain Objectives of visit: • insight in the basic concepts of comparative modeling and docking of ligands • usage of Modeler software and Autodock-vina packages

Oreetseng Moncho 4 – 24 October 2011 European Bioinformatics Institute, Hinxton, Cambridge Objective of visit: • Develop skills to implement a local version of an ENSEMBL webserver for agriculture research data visualization.

Edwin Murungi 14 Mar – 02 July 2011 Mark Field’s Laboratory, Department of Pathology, Cambridge University Objective of visit: • Use immuno-fluorescence microscopy to identify sub-cellular localization of SNARE proteins expressed in trypanosomes grown in culture

National internship: Mushal A. M. A. Ali 5 June -5 August 2011 Vector Control Reference Unit (VCRU), National Institute for Communicable Diseases (NICD), Johannesburg, South Africa Objectives of visit: • Identify and characterise microRNA expressed in the different developmental periods (eggs, larvae, pupae and adults) of the second major malaria vector in Africa Anopheles funestus. • Anopheles mosquito rearing and identification. • RNA extraction, purification and quantification.

A n n u a l R e p o rt 2 0 1 1

Capacity Development cont. Conference participation During 2011 students had opportunities to interact with foreign institutions and some participated at local and international conferences.

International Conferences: 5 students attended international conferences and presented their research.

Kavisha Ramdayal 26 – 9 March 2011 7th Biovision World Life Sciences Forum Lyon, France • Authored an article for Naturejobs covering one of the BioVision sessions, available online at http://blogs. nature.com/naturejobs/2011/04/07/tech-savvy-scientists-needed-for-healthcare-innovation.

Firdous Kahn 15 – 20 July 2011 19th Annual International Conference on the Intelligent Systems for Molecular Biology and 10th European Conference on Computational Biology Vienna, Austria

Stanley Mbandi Kimbung 31 Aug – 09 September 2011 MBO Global Exchange Lecture Course: "Next Generation Sequencing for Africa” Nairobi, Kenya

Mahjoubeh Jalali 12 – 15 November 2011 6th International Conference of Genomics Shenzhen, China

Sumir Panji 10 – 13 November 2011 Bringing Together the Tsetse Genome Research Meeting Sanger Institute, Cambridge, UK

17

18

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Capacity Development cont. National Conferences: 6 students attended national conferences and presented their research. Kavisha Ramdayal 30 January – 5 February 2011 EMBO Global Exchange Lecture Course on HIV AIDS Stellenbosch, South Africa

Fredrick Nindo 30 January – 5 February 2011 EMBO Global Exchange Lecture Course on HIV AIDS Stellenbosch, South Africa

Samson Muyanga 07 – 11 March 2011 South Africa/Argentina Joint Regional Biosafety Workshop and Seminar “Biosafety of GM crops: Emerging issues and challenges in regulatory decision making” Pretoria, South Africa

Samson Muyanga 11 – 15 September 2011 Genetically modified organisms in horticulture Symposium Pretoria, South Africa

Kavisha Ramdayal 25 – 26 October 2011 UWC Faculty of Science Post-Graduate Research Open Day 2011 Bellville, South Africa Poster: "Characterising HIV-1 subtype C gp160 envelope sequence diversity in the female genital tract and plasma during acute and chronic infection."

Emil Tanov 25 – 26 October 2011 UWC Faculty of Science Post-Graduate Research Open Day 2011 Bellville, South Africa Poster: “Identification of biologically important secondary structures in Enterovirus genus”.

A n n u a l R e p o rt 2 0 1 1

19

Computational Resources Historically, SANBI has relied on a heterogenous server infrastructure, with a collection of over a dozen servers ranging from simple Intel PCs to IBM p690s. These have provided web services, a high performance computing infrastructure, and the various infrastructure services (authentication, mail, file server) necessary for the institute's functioning. In 2011 we started migrating from this old infrastructure (80 CPU cores and 8 TB of disk space, spread across various servers) to a new, more powerful, server infrastructure. SANBI's new server infrastructure core consists of a set of Dell blade servers, with a total of 88 CPU cores and 512 GB RAM. The blade servers include six M710HD servers, with between 64 and 96 GB of RAM and 12 CPU cores each, and one M910HD with 512 GB of RAM and 16 CPU cores. Together with a R710 rack mounted server, these provide an environment for both High Performance Computing and a widespread use of virtual machine (VM) technology to meet the divergent needs of the Bioinformatics community. We are currently researching "private cloud" solutions to manage this VM infrastructure. The new servers are connected with each other, and with our new Dell EqualLogic storage array network (SAN) by 10 Gb Ethernet, a 10 times speedup of our network infrastructure. All in all, the new server infrastructure is approximately 12 times more powerful than our old servers.

SANBI's new server infrastructure provides an environment for both High Performance Computing and a widespread use of virtual machine (VM) technology to meet the divergent needs of the

The year 2011 has seen a tremendous growth in sophistication in our computing environment as we have deployed PUPPET for centralised configuration management Bioinformatics and are migrating to an "infrastructure as code" approach to deploying and managing server infrastructure. This frees up time and effort from our computer systems group so that they can spend more time on new infrastructure research and development, and less time on maintenance.

community.

The services provided to the SANBI community and their collaborators include: 1) A Grid Engine based computer cluster, that allows simplified access to the compute resources we provide. Users of the cluster have access to more than 120 different bioinformatics analysis packages and more than 15 TB of disk space for storing research data and results. 2) Web servers both for the main SANBI website as well as for presenting research outputs from SANBI's different research groups.

Powervault with M710HD

3) Application servers hosted on our virtual machine infrastructure, to allow each group within SANBI to configure as they see fit. This approach is vital in bioinformatics, as the software environment required by groups can be dramatically different, and while providing physical servers for each project is, neither feasible nor optimal, using virtual machines allows us to provide for the needs of each group without their requirements coming into conflict with one another. 4) Database servers, providing both MySQL and PostgresQL for storing relational data schemas (RDBMs). 5) Centralised data collections, including up to date versions of major biological databases. The computer systems group at SANBI continued to grow in terms of capacity throughout 2011 and through our work at SANBI and collaboration with partners at other institutions, continues to provide for the computing needs of our SANBI research community.

M910HD

20

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Awards and Honours Alan Christoffels was elected to the Academy

Awardee

Awarding Institution Nature of award/ recognition

Significance

Alan Christoffels

Academy of Science of South Africa

Membership

Elected to the academy for recognition of scientific research

Alan Christoffels

Belgium-Flemish government’s Flanders Care International Visitors’ programme

6-member international team visiting health care initiatives in Flemish Belgium

The Unit Director was the South African representative on the trip with the aim of encouraging international exchange in the Health care sector.

Emil Tanov

UWC Science Faculty

2nd Best poster presented at the Research Open Day

Voted the 2nd best MSc research paper in the Science Faculty at its open day.

of Science of South Africa for recognition of scientific research.

International team visiting healthcare initiatives in Flemish Belgium. From L – R: Mrs Biruta Kleina (Latvia), Mr Yuk Han Sun (Hong Kong), Dr Ilesh Jani (Mozambique), Dr Alan Christoffels (South Africa), Mr Piet Houwen (Vice President and General Manager, GENZYME), Mrs Catharina de Jong (Netherlands), Mr David Jones (UK), Mrs Christine Breugelmans (Senior Communications and PR Officer, Flemish Department of Foreign Affairs)

Care International Visitors’ programme

Emil Tanov

UWC Science Faculty

initiatives in Flemish Belgium

nd

2 Best poster presented at the Research Open Day

SANBI in the Media

representative on the trip with the aim of encouraging A n n u a l R e p oexchange rt 2 0 1 1 international in the Health care sector. nd Voted the 2 best MSc research paper in the Science Faculty at its open day.

21

SANBI in the media The article below appeared in the Cape Argus of 27 July 2011. Professor Travers commented on prevention remaining optimal for Professor combating in the future as no effective vaccine has for The article below appeared in the the Cape Argusapproach of 27 July 2011. TraversHIV commented on prevention remaining the optimal approach been developed to date. combating HIV in the future as no effective vaccine has been developed to date.

Mail & Guardian Supplement, 18 - 25 November 2011

Irish Times publication, 3 May 2011

Irish Times publication May 3, 2011

HIV-related cases pose big problems for criminal law CLAIRE O'CONNELL The Irish Times - Tuesday, May 3, 2011 THE CRIMINAL justice system is ill-equipped to deal with the complexities of HIV, HIV-related cases pose big problems for criminal law and legal decisions need to be based on good science, according to writer and CLAIRE O'CONNELL advocate on HIV-related issues Edwin J Bernard, who will give a public seminar THE CRIMINAL justice system is ill-equipped to deal with complexities of HIV, and legal decisions began needintothe belate in the Galway today. “Although the first laws and prosecutions based on good science, according to writer and advocate on HIV-related issues Edwin J Bernard, who will give a 1980s, there are more laws being passed and more prosecutions today than ever before,” said Mr Bernard, who analyses the global criminalisation of people public seminar in Galway today. living 1980s, with HIV. there To date,are more than 600 individuals in more than 40 more countries “Although the first laws and prosecutions began in the late more laws being passed and convicted the of HIV exposure or transmission, according to Mrliving Bernard, prosecutions today than ever before,” said Mr Bernard, have whobeen analyses global criminalisation of people who will talk today about how criminal law is applied in such instances. with HIV. “I’ll argue that in the vast majority of cases, laws and prosecutions are irrational, To date, more than 600 individuals in more than 40unjust countries have beenbased convicted HIV exposure or and counterproductive, on stigma of not science.” Situations where transmission, according to Mr Bernard, who will talk todaya person aboutgenuinely how criminal isharm applied such instances.noted Mr intendedlaw to do shouldinlead to prosecution, Bernard. “If someone planned tounjust harm someone by not telling them they are “I’ll argue that in the vast majority of cases, laws and prosecutions are has irrational, and counterproductive, HIV-positive and then has unprotected sex with the intention of harming them – based on stigma not science.” they were actually infected – then noted these very, rare cases should be, and Situations where a person genuinely intended to do harmand should lead to prosecution, Mrvery Bernard. are, prosecuted,” he said. “However, there are many people with HIV in prison, “If someone has planned to harm someone by not telling them they are HIV-positive and then has unprotected including some who have died in prison, who did not do anything that risked sex with the intention of harming them – and they were actually infected – then these very, very rare casesThe should harming someone with HIV, and certainly did not infect anyone.” laws and be, and are, prosecuted,” he said. prosecutions also have a wider impact, added Mr Bernard, whose public seminar “However, there are many people with HIV in prison, including some whobyhave died in prison, did notInstitute. do today is being sponsored the South African National who Bioinformatics “For people livinginfect with HIV, they can create a climate of fear and uncertainty anything that risked harming someone with HIV, and certainly did not anyone.”

and media reporting of such cases does nothing to reduce HIV-related stigma, the greatest non-health-related challenge for those living with HIV. “For everyone else, these laws and prosecutions are creating a distorted picture of HIV-related harm and risk and undermining the public health message that everyone shares responsibility for their sexual health,” he said. “What UNAIDS and others are working on right now is to ensure that everyone involved in law making and in the criminal justice system understands the latest advances in the science of HIV. “Justice can be better achieved when laws and legal decisions are based on good science, best practice guidelines for police and prosecutors are in place, and people with HIV accused of such ‘crimes’ have improved access to justice.” The public seminar, HIV and the Criminal Law: Combating Stigma Through Science, will take place in the MRI Annex, NUI Galway today at 5.30pm and is part of the 18th International Conference on HIV Dynamics and Evolution, hosted by NUI Galway.

15 arguably SANBI ANNUAL REPORT 2011

Flanders Today publication, 21 December 2011

International VIPs visit Flanders Care Flanders last week welcomed six international opinion and decision makers as part of the Flanders International Visitors Programme, an ongoing programme of five-day visits focusing on different sectors. The theme this time was Flanders Care: Focus on Innovation and Entrepreneurship in Health Care in Flanders. The six opinion makers were Dr Alan Christoffels from South Africa, Catharina de Jong from the Netherlands, Dr Ilesh Jani from Mozambique, David Jones from the United Kingdom, Yuk Han Sun from Hong Kong and Biruta Kleina from Latvia.

22

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Community Engagement Over the past three years we have focused on the Life Orientation subject at schools and developed a grade 7-9 resource kit using Tuberculosis as a theme.

In 2008, the Center for Disease Control USA funded UWC under an HIV/AIDS public health programme. As part of this funding mechanism, Alan Christoffels from SANBI and Patricia Struthers from the Physiotherapy department were funded to develop educational material for grade 8 learners. During the past three years the resource was tailored to grade 7 - 9 learners to accommodate the national education department criteria. This project targeted 7 pilot schools, namely Elswood, Ravensmead, St Andrews, Sakumlandela, Iingcinga Zethu, York and Langenhoven. Over the past three years we have focused on the Life Orientation subject at schools and developed a grade 7-9 resource kit using Tuberculosis as a theme.

Events that marked the progress on this project in 2011 included (a) completion of the curriculum content into 8 lesson plans with individual and group activities, (b) graphic illustrations for each lesson plan and (c) development of an electronic interactive DVD that replaced the information content in the lesson plans together with digital story-telling and games. The final year (2012/13) will be devoted to printing of 4000 copies of both the hard copy and DVD for each learner in the pilot schools and external monitoring and evaluation of the use of the resource kit.

TB in sputum

HIV

University of the Western Cape

Interactive DVD Workbook Grade 7-9 interactive life orientation resource kit to be piloted in 7 schools in the Western Cape

A n n u a l R e p o rt 2 0 1 1

Research Outputs Summary: Output Type

Output Total

Journal Publications

11

Book Chapter

1

Software

6

Invited Talks

8

Conference Talks

10

Conference Posters

15

Theses

6

Theses Examined

4

Peer reviewed journal articles: No.

Authors (Unit contributors in bold)

Impact Factor

1

Samuel Kwofie, Ulf Schaefer, Vijayaraghava Sundararajan, Vladimir Bajic, Alan 3.086 Christoffels. HCVpro: Hepatitis C virus protein interaction database. Infection, Genetics and Evolution. Dec 2011. 11(8):1971-1977

2

Samuel K Kwofie, Aleksandar Radovanovic, Vijayaraghava S Sundararajan, Monique 3.086 Maqungo, Alan Christoffels and Vladimir B Bajic. Dragon Exploratory System on Hepatitis C Virus (DESHCV). Infection, Genetics and Evolution. 2011. 11(4):734-9

3

Sarah Mwangi, Edwin Murungi, Mario Jonas and Alan Christoffels. Evolutionary 3.086 Genomics of Glossina morsitans immune-related Serine proteases and Serine Protease inhibitors. Infection, Genetics and Evolution. 2011. 11(4): 740-745.

4

Vladimir Shulaev, Daniel J Sargent, Ross N Crowhurst, Todd C Mockler, Otto Folkerts, Arthur 36.377 L Delcher, Pankaj Jaiswal, Keithanne Mockaotis, Aaron Liston, Shrinivasrao P Mane, Paul Burns, Thomas M Davis, Janet P Slovin, Nahla Bassil, Roger P Hellens, Clive Evans, Tim Harkins, Chinnappa Kodira, Brian Desany, Oswald R Crastam Roderick V Jensen, Andrew C Allan, Todd P Michael, Joao Carlos Setubal, Jean-Marc Celton, D Jasper G Rees, Kelly P Williams, Sarah H Holt, Juan Jairo Ruiz Rojas, Mithu Chatterjee, Bo Liu, Herman Silva, Lee Meisel, Avital Adato, Sergei A Filichkin, Michela Troggio, Roberto Viola, Tia-Lynn Ashman, Hao Wang, Palitha Dharmawardhana, Justin Elser, Rajani Raja, Henry D Priet, Douglas W Bruant Jr, Samuel E Fox, Scott A Givan, Larry J Wilhelm, Sushma Naithani, Alan Christoffels et al. The genome of woodland strawberry (Fragaria vesca). Nature Genetics. 2011. 43: 109-116. Doi:10.1038/ng.740

5

Vijayaraghava Seshadri Sundararajan, Musa Nur Gabere, Ashley Pretorius, Saleem Adam, Alan Christoffels, Minna Lehvaslaiho, John A. C. Archer and Vladimir B. Bajic. DAMPD: a manually curated antimicrobial peptide database. Nucleic Acids Research, 2011 (1-5). doi:10.1093/nar/gkr1063

6

4.411 Niamh E. Redmond, Jean Raleigh, Rob W. M. van Soest, Michelle Kelly, Simon A. A. Travers, Brian Bradshaw, Salla Vartia, Kelly M. Stephens, Grace P. McCormack. Phylogenetic Relationships of the Marine Haplosclerida (Phylum Porifera) Employing Ribosomal (28S rRNA) and Mitochondrial (cox1, nad1) Gene Sequence Data. PLoS ONE. 2011. 6(9): e24344

7

4.877 Semegni JY, Wamalwa M, Gaujoux R, Harkins GW, Gray A, Martin DP. NASP: a parallel program for identifying evolutionarily conserved nucleic acid secondary structures from nucleotide sequence alignments. Bioinformatics. 2011 Sep 1;27(17):24435. Epub 2011 Jul 14.

7.836

23

24

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Research Outputs cont. 8

Monjane AL, Harkins GW, Martin DP, Lemey P, Lefeuvre P, Shepherd DN, Oluwafemi 5.189 S, Simuyandi M, Zinga I, Komba EK, Lakoutene DP, Mandakombo N, Mboukoulida J, Semballa S, Tagne A, Tiendrébéogo F, Erdmann JB, van Antwerpen T, Owor BE, Flett B, Ramusi M, Windram OP, Syed R, Lett JM, Briddon RW, Markham PG, Rybicki EP, Varsani A. Reconstructing the history of maize streak virus strain a dispersal to reveal diversification hot spots and its origin in southern Africa. Journal of Virology. 2011 Sep;85(18):9623-36. Epub 2011 Jun 29.

9

Bansode VB, Travers SAA, Crampin AC, Ngwira B, French N, Glynn JR and McCormack 1.77 GP. (13 October 2011). Reverse Transcriptase drug resistance mutations in HIV-1 Subtype C infected patients on ART in Karonga District, Malawi. AIDS Research and Therapy. 2011, 8(38) doi:10.1186/1742-6405-8-38

10

4.411 Lefeuvre P, Harkins GW, Lett JM, Briddon RW, Chase MW, Moury B, Martin DP. Evolutionary time-scale of the begomoviruses: evidence from integrated sequences in the Nicotiana genome. PLoS One. 2011;6(5):e19193. Epub 2011 May 16.

11

Seager I, Leeson MD, Crampin A, Mulawa D, French D, Glynn J, Travers S.A.A., 2.082 McCormack GP. HIV-1 mutational patterns in HIV-1 subtype C infected long-term survivors in Karonga District Malawi: correction and further analysis. AIDS Research and Human Retroviruses. 2011 August 30 Epub ahead of print. TOTAL JOURNAL IMPACT FACTOR

76.211

Chapters in books: 1

Tiffin, N. Conceptual thinking for prioritization of candidate disease genes. In Methods in Molecular Biology - In Silico Tools for Gene Discovery. Editors: Bing Yu; Marcus John Hinchcliffe. Publisher: Humana Press

Software and similar outputs developed or generated/implemented: Year

Software Resources

Impact

2011

miRNA targets in insects http://insectar.sanbi.ac.za

Out of a PhD project – tool to predict miRNA targets in insects with a focus on mosquitoes

2011

HCV protein interaction database http://apps.sanbi.ac.za/hcvpro

Output of a PhD project – hepatitis C virus resource that identifies protein-protein interactions

2011

Glossina Genomics Analysis Resource http://iggiweb.sanbi.ac.za/markw/

Output of a PhD project - A central portal for disease vector research on the African Continent

2011

Arabidopsis Stress-related Transcription Factor Database http://apps.sanbi.ac.za/dastf/

Output of an MSc project – regulatory motifs identified in response to environmental stress on plants.

2011

Grade 7-12 learners and Teachers e-learning resource kit http://skills4life.org

Interdisciplinary project- An interactive DVD was finalised during 2011 together with a workbook. This electronic resource was launched in March 2012.

2011

NASP: Nucleic Acid Structure Predictor Inter-university project - A parallel program for http://web.cbio.uct.ac.za/~yves/nasp_portal identifying evolutionary conserved nucleic acid secondary structures from sequence alignments

A n n u a l R e p o rt 2 0 1 1

Research Outputs cont. Invited plenary: Jan 2011

Invited Talk Simon Travers. Institute of Immunology Seminar Series, National University of Ireland, Maynooth Uncovering the pathways of HIV resistance to antiretroviral therapies.

Feb 2011

Keynote

Mar 2011

Invited Talk Simon Travers. Molecular and Cell Biology Seminar Series, University of Cape Town. Understanding the mechanisms of HIV-1 drug resistance.

Simon Travers. Bifx Africa-India joint virtual conference. Feb 2011. A tale of two pathways: HIV resistance to treatment with CCR5 antagonists.

May 2011 Invited Talk Alan Christoffels. Dentistry Faculty Research Day, UWC. Genomic Strategies to combat Tuberculosis. Sep 2011

Invited Talk Simon Travers. Biomedical Sciences Seminar Series, Tygerberg Hospital, Cape Town. Investigating resistance to next-generation HIV therapeutic interventions.

Nov 2011

Invited Talk Simon Travers. National Institute for Communicable Diseases (NICD) 454 Sequencing Platform Launch. 09 November 2011. Amplicon-based approaches using 454 sequencing in viral and metagenomic studies - a bioinformatics perspective.

Nov 2011

Invited Talk Junaid Gamieldien. Novartis Research Day, UWC. Biomedical Knowledge Integration to Support Clinical Genomics.

Nov 2011

Invited Talk Alan Christoffels. Tsetse Genomics, Wellcome Trust Genome Campus, Cambridge, UK. Annotation of immunity genes.

Conference presentations or posters: Jan 2011

Talk

Simon Travers. Pathogen Biology and Evolution Meeting. Strasbourg, France. What now? HIV research in the South African National Bioinformatics Institute.

Feb 2011

Talk

Junaid Gamieldien. Bifx Africa-India joint virtual conference. Feb 2011. Semantic Integration of Biomedical Knowledge and Existing Data to Support In-Silico Discovery

Feb 2011

Talk

Nicki Tiffin. Bifx Africa-India joint virtual conference. Feb 2011. A bioinformatics perspective on human disease genetics

Mar 2011

Talk

Gordon Harkins. ISCB Africa ASBCB Conference on Bioinformatics 2011. Cape Town, South Africa, March 2011. The spread of Tomato yellow leaf curl virus from the Middle East to the world.

Mar 2011

Talk

Simon Travers. ISCB Africa ASBCB Conference on Bioinformatics 2011. Cape Town, South Africa, March 2011. A tale of two pathways: Characterising HIV resistance to treatment with CCR5 antagonists treatment.

Mar 2011

Talk

Ruben Cloete, Ekow Oppon, Alan Christoffels. ISCB Africa ASBCB Conference on Bioinformatics 2011. Cape Town, South Africa, March 2011. In-Silico TB drug design using comparative genome analysis of DS, MDR and XDR isolates from KZN.

Mar 2011

Talk

Samuel Kwofie, Vlad Bajic, Alan Christoffels. ISCB Africa ASBCB Conference on Bioinformatics 2011. Cape Town, South Africa, March 2011. Inferring enriched biological information from graphs composed of text-derived biomedical concepts of ontologies related to Hepatitis C Virus.

25

26

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Research Outputs cont. Mar 2011

Talk

George Obiero. ISCB Africa ASBCB Conference on Bioinformatics 2011. Cape Town, South Africa, March 2011. Comparative Annotation and Analysis of Protein-Coding DNA Sequences of Theileria parva Marikebuni against Theileria parva Muguga genomes.

Mar 2011

Poster

Roetz, N, Möller, M, Tiffin, N, Christoffels A, Hoal, E. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. Variants in Host Susceptibility to Tuberculosis using SNP Array.

Mar 2011

Poster

Tiffin, N, Hofmann, O, The SysCo Consortium, Schwegmann, A, Brombacher, F, Hide, W. ISCB Africa ASBCB Conference on Bioinformatics , Cape Town, South Africa, March 2011. Analysis of differential gene expression and regulatory networks in Leishmaniainfected macrophages from susceptible and resistant mouse strains.

Mar 2011 Poster

Stanley Kimbung, Jean-Marc Celton, Oreetseng Moncho, Lizex Husselman, Adugna Woldesemayat, Joseph Mafofo, Peter van Huesden, Jasper Rees, Alan Christoffels. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. A Computational Framework for Venturia Inaequalis Genomics.

Mar 2011

Poster

Emily Tangie, Vincent Titanji, Alfred Ngwa, Damian Anong, Stanley Mbandi, Ivo Tening, Raymond Yengo. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. Differential Immunoglobulin G Response to UB05- a Potential Plasmodium falciparum Vaccine Target.

Mar 2011

Poster

Alecia Naidu, Mmakamohelo Direko, Peter van Heusden, Junaid Gamieldien, Paul van Helden, Rob Warren, Nico Gey van Pittius, Alan Christoffels. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. Development of a computational framework for the management of next generation Mycobacterial Sequencing Data.

Mar 2011

Poster

Musa Gabere, Alan Christoffels, William Noble, Vladimir Bajic. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. HAPP: Haematophagus antimicrobial peptide predictor.

Mar 2011

Poster

Jean Yves Semegni, Mark Wamalwa, Gordon Harkins, Alistair Gray, Darren P Martin. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. NASP: A parallel program for identifying evolutionarily conserved nucleic acid secondary structures from sequence alignments.

Mar 2011

Poster

Samson Muyanga, Ashley Pretorius, Firdous Khan, Alan Christoffels. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. Computational Discovery of Carotenoid Pathway Regulatory Networks.

Mar 2011

Poster

Mark Wamalwa. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. The transcriptome profile of Glossina morsitans morsitans: a vector for sleeping sickness.

Mar 2011

Poster

Adugna Woldesemayat, Junaid Gamieldien, Bongani Ndimba, Alan Christoffels. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. Computational identification of candidate genes for drought tolerance in Sorghum (Sorghum bicolor (L.) Moench).

A n n u a l R e p o rt 2 0 1 1

Research Outputs cont. May 2011 Talk

Simon Travers. 18th International HIV Evolution and Dynamics Meeting. Galway, Ireland. 1 - 4 May 2011. Drug Resistance in HIV-1 Subtype C infected patients on ART in Karonga District, Malawi using consensus and next-generation sequencing.

Jun 2011

Talk

Simon Travers. 5th SA AIDS Conference, Durban. 7-10 June 2011. Characterising the Emergence, Prevalence and Persistence of Drug Resistant Variants in the Viral Population of HIV-1 Subtype C Infected Individuals.

Mar 2011

Poster

Z.Chikwambi, A Christoffels and D.J.G Rees (2011). Fruit. Biotechnology Fruit Conference. Pretoria. Developmental peel and pulp tissue specific mRNA expression profiling in Malus x domestica Borkh. Cv “Golden Delicious”

Sep 2011

Poster

Samson Muyanga, Firdous Khan and Alan Christoffels. Holticulture GMO symposium. Kruger National Park, Mpumalanga. Sep 2011. Computational discovery of caroteniod pathway networks.

Nov 2011

Poster

Junaid Gamieldien. The 6th International Conference on Genomics. Shenzhen, China. Driving Disease Gene Discovery with Biomedical Semantic Networks.

Nov 2011

Poster

Mahjoubeh Jalali Sefid Dashti, Van Velden DP, Gamieldien J, Fisher LR, Marnewick J, Kidd M, Kotze M. The 6th International Conference on Genomics. Shenzhen, China. Evaluation of high-throughput methodology for multi-gene screening in South Africans at risk of cardiovascular disease.

Nov 2011

Poster

GFO Obiero, PO Mireji, A Christoffels and D Masiga. ICIPE Research Open Day, Nairobi, Kenya Bioinformatics approaches to finding the tsetse fly nose.

Nov 2011

Poster

Tanov, E, Martin D.P, Muhire, B, Golden, M and Harkins, GW. UWC Life Science Research Open Day Identification of biologically important secondary structures in Enterovirus genus.

Nov 2011

Poster

Lambson, B, Ramdayal, K, Moore, PL, Abrahams, MR, Bandawe, Karim SA,Williamson C, Martin,DP, Harkins, GW and Morris L. UWC Life Science Research Open Day Characterizing HIV-1 subtype C gp160 envelope sequence diversity in the female genital tract and plasma during acute and chronic infection.

Theses: Name

Degree

Thesis

Musa Gabere

PhD, UWC

Prediction of antimicrobial peptides using hyperparameter optimised support vector machines.

Samuel Kwofie

PhD, UWC

Development of a Hepatitis C virus knowledgebase with computational prediction of functional hypothesis of therapeutic relevance.

Monique Maqungo

PhD, UWC

Prostate cancer knowledgebase with functional genomic data analysis.

Conor Meehan

Understanding the interaction between HIV-1 and chemokine receptors PhD, National during host cell entry with focus on the potential for resistance to University of Ireland, Galway CCR5-antagonists.

Mark Wamalwa

PhD, UWC

Development of a comprehensive annotation and curation framework for analysis of Glossina morsitans morsitans expressed sequence tags.

Saleem Adam

MSc, UWC

A knowledge base of stress response gene-regulatory elements in Arabidopsis Thaliana.

27

28

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Research Outputs cont. Thesis examination for students from other institutions: Alan Christoffels

University of Pretoria University of Cape Town

PhD MSc

Junaid Gamieldien

University of Stellenbosch

MSc

Nicki Tiffin

University of the Witwatersrand

MSc

Expert panel or professional membership: Christoffels

Board of Directors – International Society for Computational Biology Scientific Committee Chair for the African Society of Bioinformatics and Computational Biology Meeting in Cape Town Scientific committee: International Conference on Bioinformatics (InCoB): InCoB11 Malaysia Nov 2011. Official annual conference of the Asia-pacific Bioinformatics network (APBioNet) NRF Rating Panel

Travers

Scientific committee; 19th International HIV Dynamics and Evolution Conference Review panel for South African Research Chairs Initiative (NRF): Bioinformatics and Functional Genomics Stream

Policy briefs: Human Heredity and Health in Africa (H3Africa) In response to the Human Health and Heredity in Africa (H3Africa) joint initiative funded by the Wellcome Trust (UK) and the National Institute of Health (USA), SANBI unit has partnered with multiple African researchers to While the outcome of this propose a Pan African Bioinformatics network. While the outcome of this proposal is not available yet, SANBI will be one of the centers of excellence that will support proposal is not available the genetics community in Southern Africa.

yet, SANBI will be one of the centers of excellence

Additionally, SANBI is the bioinformatics partner in multiple disease-specific research networks under this initiative.

that will support the

Southern African Human Genome Programme (SAHGP) Alan Christoffels together with six other academics across the country submitted genetics community in a successful proposal to the Department of Science and Technology during June Southern Africa. 2010 for seed funding to support a planning meeting to formalise the SAHGP and develop a 5-year plan for the SAHGP in South Africa. The planning meeting was held in January 2011 and attended by 65 researchers. A proposal to the value of 20 million rand has been submitted to the Department of Science and Technology.

Intervention programme: As part of an inter-disciplinary project funded through the CDC-funded initiative, we have completed the alpha version of an interactive electronic medium i.e., DVD to be used in schools for education relating to tuberculosis and life skills. The final printing of the DVD and manuals were completed in December 2011 and the final phase of piloting will take place in 2012. The electronic teaching tool has been placed on the internet as part of a web-resource to mirror the classroom toolkit (www.skills4life.org).

A n n u a l R e p o rt 2 0 1 1

Research Projects Overview Communicable Diseases Mycobacterium tuberculosis: Virulence mutations: In collaboration with the Tygerberg MRC unit, we are developing methods to analyse high throughput sequencing data for microbial genomes. Identification of novel drug targets in pathways known to contain drug resistant genes. Malaria: In collaboration with the National Institute for Communicable Diseases (NICD), we are investigating miRNA targets in Anopheles funestus to understand regulation of mosquito development. HIV research: HIV drug resistance: In collaboration with groups in Malawi and Ireland we are studying CXCR4-usage during disease progression Development of a software solution for user-friendly HIV drug resistance testing using the 454 sequencing platform. A longitudinal study investigating the evolution of HIV-1 subtype C viruses in the female genital tract relative to the blood plasma.

Non-communicable Diseases Multiple Sclerosis: Development of an exome sequencing and knowledge discovery pipeline to identify rare mutations. Breast Cancer: Development of a statistical method for cross-platform microarray analysis.

Agricultural research programme Fungal-host pathogens: In collaboration with Dr Rees, we are developing tools for capturing genomic data from crop and fungal genomes. and identification of pathogenic genes in infected groups.

Forthcoming Projects Rat model for post-traumatic stress disorder: In collaboration with a group at Tygerberg medical school, we will be extending our exome sequencing discovery pipeline to an analysis of RNASeq data obtained from a rat model. Salt-sensitive hypertension in African populations: In collaboration with clinicians at Groote Schuur we will sequence candidate genes in a series of 300 samples from normotensives, hypertensives and salt-sensitive hypertensives from the isiXhosa-speaking population in Cape Town, using next generation sequencing techniques. This data will be compared to existing data from northern hemisphere populations.

29

30

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Research Laboratories PI: Prof Alan Christoffels DST/NRF Research Chair in Bioinformatics and Health Genomics 2009 - 2012 My genomics laboratory is primarily focused on developing methods to better understand host-pathogen interactions. Since 2010, my group has been working on applications of next generation sequencing technology to the understanding of diseases that impact health in South Africa and on the African continent. The interaction networks between host and pathogen are being studied in tuberculosis, blood-borne disease vectors – Tsetse and anopheles and more recently fungal invasion of economically important crops in South Africa. My Research Chair in Bioinformatics currently supports 6 MSc, 7 PhD students and 4 Postdoctoral fellows either directly or through external grants. The research activities have focused on next generation sequencing data and developing methods to efficiently manage and analyse high-throughput data. The consequence of the next generation sequencing momentum in South Africa has resulted in our participation in additional projects to support bioinformatics needs. Outputs: • International and national conference presentations, 5 publications in 2011, 2 internships and 4 Bioinformatics tools. • Internships: – Edwin Murungi (Mark Field’s Lab, Cambridge University, UK) – Oreetseng Moncho (European Bioinformatics Institute, UK) Tuberculosis (http://www.sanbi.ac.za/tb_genomics/)

Tuberculosis (TB) is prevalent in sub-Saharan Africa and in the context of South Africa, the incidence of TB in the Western Cape is among the highest in the country. Together with HIV, these two diseases are a deadly combination. The severity of TB prevalence in South Africa is complicated by the presence of drug-resistant TB. Intervention strategies for TB range from clinical trials for new TB drug treatment to improved surveillance of multi-drug resistance informed by advances in TB diagnostic tests. Researchers at Tygerberg Medical School have sequenced clinical isolates of TB in South Africa. In collaboration with these investigators, we are: • developing and implementing methodology for short read data from bacterial genomes • developing bioinformatics resources for managing these genetic data sets to accelerate deeper insights into the underlying mechanisms of host evasion and virulence factors • developing methods to correlate the genetic variation in TB isolates with an expanded interaction network of virulence TB genes • identifying novel drug targets using in silico docking studies. Collaborators: Profs Eileen Hoal Van Helden, Nico Gey Van Pittius, Rob Warren and Dr Cedric Werley Tygerberg Medical School, University of Stellenbosch. Dr Ekow Oppon, South African Medical Research Council.

A n n u a l R e p o rt 2 0 1 1

Research Laboratories cont. Blood-borne disease vectors (http://www.sanbi.ac.za/disease_vectors)

Sleeping Sickness Tsetse (Glossina) is the vector for trypanosomes, which cause, among other diseases, human African trypanosomiasis (HAT). There are more than 300,000 cases of HAT with millions more people at risk in 37 countries in Africa. Although not present in South Africa, HAT is prevalent in neighboring countries with new cases being reported in countries such as Zimbabwe, Zambia and Mozambique to name but a few Insights into the interaction of the trypanosome and the host could promote improved intervention strategies. Together with our collaborators (listed in brackets) we are investigating: • annotation of the tsetse genome (International Glossina Genome Consortium (IGGI)) • machine learning methods to identify immunity genes (Vlad Bajic, KAUST) • comparative genomics of serine protease inhibitors (IGGI) • protein-protein interactions between trypanosome and tsetse proteins (Prof Mark Fields, Cambridge and Prof Henry Nyongesa, Computer Science, UWC) IGGI consortium, including: Matt Berriman, Sanger Centre Serap Aksoy, Yale University Dan Masiga, International Centre for Insect Physiology and Etymology, Kenya Mike Lehane, Liverpool Tropical School of Medicine Role of miRNA in Anopheles vectoral capacity (http://insectar.sanbi.ac.za) miRNA play an essential task in gene regulatory networks by controlling the expression of genes involved in important biological processes in the cell. In insect, thousands of miRNA genes have been identified, but the function of most of these miRNAs remain unknown due to lack of experimental and computational approaches to predict their exact target mRNAs. In collaboration with Prof Lizette Koekemoer, we are developing an integrated system to identify miRNA targets in Anopheles and other insects. Diseases of Apple (http://www.sanbi.ac.za/agri_genomics)

Apple scab is one of the most destructive diseases of apple (Malus x domestica borkh.) and is caused by the hemi-biotrophic fungus Venturia inaequalis (Cooke) Winter. Scab is a serious problem in all apple-producing regions of the world and requires a series of 12 to 15 fungicide sprays per year in commercial orchards. In total, eight races of the scab pathogen have been defined by incompatibility, determined by avirulence genes (avr genes), on corresponding host cultivars carrying a major resistance gene (R gene). Next generation sequencing has been generated for apple fungal infection. Data management and downstream analysis protocol requires a computational framework that allows users to engage with the data and promotes hypothesis driven research. The following bioinformatics projects are underway:

31

32

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Research Laboratories cont. • • •

establishing a local instance of ENSEMBL Plant and Fungal databases, genome browser and communication protocols transcriptome profiling of host-pathogen response integrating transcription data, SNP and CNV data and information from more selective sequencing strategies such as ChIP-Seq and Bisulphite sequencing (for methylation analysis).

Collaborator: Professor Jasper Rees - Agricultural Research Council

A n n u a l R e p o rt 2 0 1 1

Research Laboratories cont. PI: Dr Junaid Gamieldien Knowledge Integration & Biomarker Discovery Group Core Project: Semantic Integration of Biomedical Knowledge Our core research project, which many of our other projects rely on, focuses on the semantic integration and re-use of high-value biomedical information in the public domain to: 1) enable in-silico experimentation that encompasses multiple knowledge domains and 2) for contextualising the results of high throughput experiments. We use a knowledge representation technique known as a semantic network, which is stored in a next-generation graph database. This greatly simplifies the integration of complex biological information in the way biologists think about and reason across them. The flagship project is focused on human health, which seamlessly integrates hundreds of thousands of human, mouse and rat: gene, gene to disease, gene to phenotype and gene to pathway relationships. The semantic database is particularly relevant in the disambiguation of experiments that generate large numbers of leads. For example, high throughput technologies like next generation sequencing make it possible to identify multiple gene candidates that may be of biomedical interest. We have tested the utility of our semantic database in clarifying the often unclear or unapparent links between novel mutations recently reported in the literature and the diseases or phenotypes being investigated, and have found that we can often provide better insights than the original publication. Our system often also uncovers the underlying biological mechanisms that lead to the development of phenotypes associated with a disease. While the utility of the current semantic network is clear, we are constantly adding relevant genomic information to the system and will prioritise adding genome scale knowledge on gene expression in specific tissues to the semantic network in 2012. Project 2: Disease Gene Discovery with Genome Sequencing and Semantic Networks We are developing an exome sequencing + knowledge-discovery pipeline to identify the genetic cause(s) of disease in a patient with multiple sclerosis. We are developing a novel concept to prioritise mutations by mining the database for genes associated with ‘surrogate’ or ‘secondary’ phenotypes links to disease (e.g. demyelination in multiple sclerosis). This includes direct human gene to phenotype information as well as transitive associations via model organism evidence (gene knockout phenotypes). The latter has the potential to identify rare mutations associated with a phenotype that would otherwise be missed and formally mapping between phenotypes and diseases in the semantic network will therefore be prioritised in 2012. Other disease cases will also be sequenced and a version of the method will be applied in a large-scale RNAseq project studying a rat model of post-traumatic stress disorder. Project 3: Statistical Methods for Cross-platform Microarray Analysis in Cancer A large volume of health research focused gene expression data exists in public repositories like and there is a significant opportunity to re-use microarray data in various combinations for novel in-silico analyses that would otherwise be too costly to perform. For example, thousands of cancer experiments, where the aim was to identify genes being differentially expressed in normal versus tumour tissue, are available. We have developed a method for combining and re-analysing large numbers of data sets that may have been generated on different technology platforms as a means to increase the statistical power of the meta-analysis, while weakening the effects of individual study-specific biases. We are applying this method to identify driver genes in the development and metastasis of breast cancer. In 2012 the method will be applied to machine learning based classification of tumours e.g. benign, malignant, metastatic drug resistant, based on gene expression signatures identified in the large merged datasets.

33

34

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Research Laboratories cont. PI: Dr Gordon Harkins My research primarily focuses on the evolution and molecular epidemiology of ssDNA and RNA viral pathogens of animals and plants. I am key member in a fledgling, but already highly productive, plant-virus epidemiology network seeking to determine the evolutionary underpinnings of the emergence and spread of the numerous novel geminiviral agricultural diseases that seriously threatening the food security of Africa and the rest of the developing world. Together with my collaborators I have been investigating nucleotide sequence data from a broad range of virus species to determine characteristic features of the population histories, evolution rates and migration patterns associated with the geminivirus emergence events that have recently been detected in Africa, South America and the Pacific Rim. Besides informing policy makers on the potential risks associated with relaxed controls on the movements of agricultural produce, this work will hopefully identify correlates of impending virus emergence that could form the basis of a much needed pandemic early warning systems (such as those which are currently in place for influenza A. A summary of some of the research projects that I have been involved in during 2010-2011 is presented below. The identification of biologically important secondary structures in single stranded RNA and DNA viral pathogens. Besides a capacity to store information within the sequences of their component nucleotides, the genomes of RNA viruses can also potentially store information within their folded secondary structures. RNA viral genomes often contain conserved secondary structures that play a vital role during the various stages of the viral life cycle influencing many biological processes such as genome replication, viral packaging, intracellular trafficking, gene expression and genetic recombination. While much is known about regulatory motifs in RNA at the 5’ and 3’ untranslated regions, most potential regulatory elements within RNA viral genomes likely remain uncharacterised. These regulatory motifs constitute an important component of the genetic code and as such indicate that much remains to be discovered by the analyses of singe stranded RNA/DNA viral genomes and intact messenger RNAs (mRNAs). Therefore, an efficient and accurate structure prediction methodology can give vital directions to experimental studies aiming to evaluate the function of these conserved secondary structure architectures. We have devised such a tool called NASP (Nucleic Acid Structure Prediction) that identifies evolutionarily conserved nucleic acid secondary structures sequences (Semegni et al. 2011), that takes as input a nucleotide sequence alignment and returns the most probable evolutionarily conserved consensus secondary structure. Downloadable and web-based versions of the software programme Nucleic Acid Structure Prediction (NASP) are freely available at http://web.cbio.uct.ac.za/~yves/nasp_portal.php http://web.cbio.uct.ac.za/~yves/nasp_ portal.php NASP: A Parallel Program for Identifying Evolutionarily Conserved Nucleic Acid Secondary Structures from Sequence Alignments Semegni et al. 2011 Bioinformatics 27: 2443-2445. The evolution and molecular epidemiology of Tomato yellow leaf curl virus (TYLCV) The ongoing global spread of Tomato yellow leaf curl virus (TYLCV; Genus Begomovirus, Family Geminiviridae) represents a serious looming threat to tomato production in all temperate parts of the world. Whereas determining where and when TYLCV movements have occurred could help curtail its spread and prevent future movements of related viruses, determining the consequences of past TYLCV movements could reveal the ecological and economic risks associated with similar viral invasions. Towards this end we applied Bayesian phylogeographic inference and recombination analyses to available TYLCV sequences (including those of 15 new Iranian full TYLCV genomes) and reconstructed a plausible history of TYLCV’s diversification and movements throughout the world. In agreement with historical accounts, our results suggest that the first TYLCVs most probably arose somewhere in the Middle East between the 1930s and 1950s (with 95% highest

A n n u a l R e p o rt 2 0 1 1

Research Laboratories cont. probability density intervals 1905–1972) and that the global spread of TYLCV only began in the 1980s after the evolution of the TYLCV-Mld and -IL strains. Despite the global distribution of TYLCV we found no convincing evidence anywhere other than the Middle East and the Western Mediterranean of epidemiologically relevant TYLCV variants arising through recombination. The Spread of Tomato Yellow Leaf Curl Virus (TYLCV) from the Middle East to the World Lefeuvre et al. 2010 PLoS Pathogens 6(10): e1001164 doi: 10.137/journal ppat. 1001164. Determining the long-term evolutionary rate of geminivirus integrons from Nicotiana genomes Whereas analyses of geminivirus substitution rates estimated using temporally structured datasets have indicated that these single stranded DNA viruses are evolving as fast as many animal and plant RNA viruses, it is still unknown when these viruses originated. Current hypotheses range from their having originated long before the evolution of flowering plants >130 MYA to their being only a few hundred thousand years old. A recently discovered geminivirus fossil within the genome of some Tobacco species indicates that relatively modern looking geminivirus-like viruses must have already been in existence between 0.2 and 9MYA. We are attempting to use the reconstructed ancestral sequences of the integrated geminivirus sequences to place upper and lower bounds on the date when geminiviruses originated. This project brings sophisticated molecular clock analyses of plant and virus sequences together with both geological data on continental drift, and paleontological fossil data on plant and insect evolution to reveal what will be the first concrete geological-time frame histories of a modern virus family. The Time-scale of Begomovirus Evolution: Evidence from Integrated Sequences in the Nicotiana genome Lefeuvre, et al. 2011 PLoS ONE 6(5): 2011. e19193. doi:10.1371/journal.pone.0019193. The historical spatial diffusion dynamics of Maize streak virus strain A (MSV-A) Maize streak virus strain A (MSV-A), the etiological agent of maize streak disease, represents one of the most serious biotic threats to African food security. Determining where MSV-A originated and how it spread transcontinentally could yield valuable insights into its historical emergence as a crop pathogen. Similarly, determining where the major extant MSV-A lineages arose could identify geographical hot spots of MSV evolution. We have used model-based phylogeographic analyses of 353 fully sequenced MSV-A isolates to reconstruct a plausible history of MSV-A movements over the past 150 years. We show that since the probable emergence of MSV-A in southern Africa around 1863, the virus spread transcontinentally at an average rate of 32.5 km/year (95% highest probability density interval, 15.6 to 51.6 km/year). Using distinctive patterns of nucleotide variation caused by 20 unique intra-MSV-A recombination events, we tentatively classified the MSV-A isolates into 24 easily discernible lineages. Despite many of these lineages displaying distinct geographical distributions, it is apparent that almost all have emerged within the past 4 decades from either southern or east-central Africa. Collectively, our results suggest that regular analysis of MSV-A genomes within these diversification hot spots could be used to monitor the emergence of future MSV-A lineages that could affect maize cultivation in Africa. Reconstructing the History of Maize Streak Virus Strain-A Dispersal to Reveal Diversification Hotspots and its Initial Origins in Southern Africa. Monjane et al. 2011 The Journal of Virology, September, Vol. 85, No. 18 p9623-9636. Reconstructing the evolutionary history of psittacine beak and feather disease Psittacine beak and feather disease (PBFD), is one of the most devastating emerging diseases affecting both wild and captive psittacine birds and poses a serious threat to the health of pet birds and the conservation of threatened species. First described in 1975 in various species of Australian cockatoos, the disease has since been reported in more than 60 psittacine species in eleven countries around the globe. Beak and feather disease virus (BFDV), family Circoviridae (genus Circovirus), has been identified as the etiological agent with surveys indicating that prevalence rates vary between 10% and 94% in both captive and wild psittacines. This study conducted by James Matthews is the first to fully exploit a recently published Bayesian

35

36

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Research Laboratories cont. phylogeographic inference method to better understand viral emergence and geographical dissemination of BFDV using full-genome data. Under this framework we will model spatial diffusion on time-measured genealogies as a continuous-time Markov chain over discrete sample locations. This temporal-spatial process will be simultaneously integrated with well-established models of sequence evolution in a Bayesian genealogical approach using the software package BEAST (Bayesian Evolutionary Analysis of Sampling Trees), allowing for the inference of historical spatial dynamics over time. Furthermore, we are evaluating a range of potential predictors of viral dissemination between pairwise countries as phylogeographic models and fitting these models individually to the BFDV whole genome sequence data These include (i) geographical distances between countries (ii) the number of psittacines transported annually from one country to another (with directionality), (iii) the psittacine population size in the country of origin, (iv) the psittacine population size in the country of destination, and (v) the product of the psittacine population sizes in the country of origin and the country of destination.

A n n u a l R e p o rt 2 0 1 1

Research Laboratories cont. PI: Dr Nicki Tiffin Introduction I work on human genetics underlying disease, specifically in African populations, aiming to characterise genetic diversity in South Africa patient populations within the disease context. I research generic computational disease gene prediction, candidate disease gene prioritisation for specific diseases, and genetics of host response to infectious disease. Ongoing projects include a collaborative project establishing a registry of patients from Cape Town who have systemic lupus erythematosus (SLE). We are building a database for effective storage and datamining of extensive clinical and biochemical patient data for these patients, and will use this data to design and implement – omic studies to further elucidate the genetic and environmental contributors to this disease. I also work with clinical collaborators to investigate genetic factors underlying susceptibility to salt-sensitive hypertension in South African patients. I continue research in the area of generic approaches to computational disease gene prioritisation, and I am completing research with the SYSCO Consortium investigating response of host macrophages to infection with Leishmania major, with our findings under preparation for publication. Research projects 1. Genetic factors underlying systemic lupus erythematosus (SLE) in South African patients SLE is a multi-systemic autoimmune disease with a broad range of clinical presentations, and high associated morbidity and mortality. The incidence and prevalence of SLE varies significantly in different ethnic groups and populations (1, 2), including the South African patient population (3). There is, however scant data on SLE in sub-Saharan Africa (4). We have established a comprehensive registry of SLE patients at Groote Schuur Hospital, Cape Town (the first 100 patients were recruited by December 2011) to aid better understanding of the occurrence, biochemical, clinical, diagnostic, prognostic, therapeutic and ‘quality of life’ features of SLE in South Africa, and to better provide appropriate treatment for South African patients. The registry is a resource for research into the genetics underlying SLE in South Africa, and DNA samples are being biobanked for future molecular research. Two research papers are under consideration for publication. Collaborators: Dr Ikechi Okpechi (MBBS, FWACP, Ph D) Dr Asgar Kalla, (MB ChB, FCP (SA), MD, FRCP (Lond).) Dr Ayanda Gcelu (MBChB, FCP (SA) MPH) 2. Genetic factors underlying salt-sensitive hypertension in South African patients This is an ongoing collaboration to investigate candidate genes for salt-sensitive hypertension, which presents with sustained elevation in blood pressure with no known underlying cause. The heritability of hypertension ranges from 30% to 60%, with variable clinical presentation and drug response (5, 6), and salt-sensitive hypertension appears to be more prevalent in people of indigenous African origin (7-9). We have previously identified candidate genes for salt-sensitive hypertension in Africans, and are applying next-generation sequencing methods to identify disease-associated variations in these genes in indigenous African patients and controls. Collaborators: Professor Brian Rayner (MBChB, FCP SA) Mr CJ Van Heerden (Central Analytical Facilities, DNA Sequencing Unit, Stellenbosch University) 3. Generic approaches to disease gene prioritisation We are developing a novel approach to enable disease gene prediction using the position of genes within the genome structure. This approach is the first of its kind to look at the frequency of recombination and proximity of recombination hotspots in relation to the likelihood of neighbouring genes to be implicated in disease.

37

38

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Research Laboratories cont. Epidemiological evidence has clearly and consistently shown that disease occurrence and genetics underlying disease can vary substantially between populations and ethnic groups (Via et al. 2009). We are investigating the prioritisation of disease genes in a population/ethnic-specific way given the haplotype block structure of a population. Masters Student: Ms Tracey Kibler, SANBI, UWC Research Output: Tiffin, N. Book Chapter: Methods in Molecular Biology: In Silico Tools for Gene Discovery. Chapter title: Conceptual thinking for prioritization of candidate disease genes. Methods Mol Biol. 2011;760:175-87. 4. Host genetics underlying response to Leishmania major Leishmaniasis is a severe disease caused by protozoan Leishmania parasites, transmitted by the bite of the sand fly. We are completing three years of studies by the SYSCO Consortium, funded by the European Union 6th framework. I work on computational aspects of the project, analysing gene expression array data from host macrophages. Manuscripts presenting the data from these studies are under preparation. Collaborators: The Sysco Consortium (http://asahttp.drim.com/syscoproject/) Dr Frank Brombacher and Dr Anita Schwegmann Outputs: Tiffin, N., Hofmann, O., The SysCo Consortium, Schwegmann, A., Brombacher, F., Hide, W. Analysis of differential gene expression and regulatory networks in Leishmania-infected macrophages from susceptible and resistant mouse strains. Poster: ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa 2011, Future direction I have been actively involved in the formation of the AfriCRAN Consortium, which is an African-wide collaboration aiming to elucidate genetic and environmental causes of craniofacial abnormalities in African populations; and I have joined research networks actively seeking funding for research into genetics of Alzheimer’s disease, pharmacogenomics in African populations, kidney disease in Africa, and the development of the African Bioinformatics Network. References: 1. Hopkinson, N. D., Doherty, M. & Powell, R. J. (1994) Ann Rheum Dis 53, 675-80. 2. Johnson, A. E., Gordon, C., Palmer, R. G. & Bacon, P. A. (1995) Arthritis Rheum 38, 551-8. 3. Okpechi, I. G., Rayner, B. L., van der Merwe, L., Mayosi, B. M., Adeyemo, A., Tiffin, N. & Ramesar, R. (2010) PLoS One 5, e9086. 4. Bae, S. C., Fraser, P. & Liang, M. H. (1998) Arthritis Rheum 41, 2091-9. 5. Shih, P. A. & O'Connor, D. T. (2008) Hypertension 51, 1456-64. 6. Lifton, R. P., Gharavi, A. G. & Geller, D. S. (2001) Cell 104, 545-56. 7. Weinberger, M. H. (1996) Hypertension 27, 481-90. 8. Sullivan, J. M., Prewitt, R. L. & Ratts, T. E. (1988) Am J Med Sci 295, 370-7. 9. Rayner, B. L., Myers, J. E., Opie, L. H., Trinder, Y. A. & Davidson, J. S. (2001) S Afr Med J 91, 594-9.

A n n u a l R e p o rt 2 0 1 1

Research Laboratories cont. PI: Prof Simon Travers Simon Travers is the principal investigator of the HIV molecular evolution research group. He graduated from his undergraduate degree in Biotechnology at the National University of Ireland, Maynooth in 2001 and completed his PhD (Bioinformatics) in 2004 also at NUI Maynooth. Following his PhD he undertook post-doctoral research with Dr Mario Fares in NUI Maynooth and Trinity College, Dublin. In late 2006 he received funding from the Irish Health Research Board (HRB) and established his research group initially in NUI Maynooth before moving to NUI Galway. He has been at SANBI since April 2010. His research focuses on the implementation of computational approaches to study various aspects of HIV evolution. He is particularly interested in the study of drug resistance in HIV and in more recent years this focus has shifted to using ultradeep sequencing approaches to characterise the entire spectrum of viral variants present within HIV infected individuals. In particular, he is interested in understanding the role of low abundance drug resistant variants on treatment outcome. Further research interests include using molecular phylogenetics to understand viral diversity and evolution, studying the molecular mechanisms driving coreceptor tropism switch in HIV as well as characterising N-linked glycosylation in HIV to further understand the therapeutic potential of N-linked glycans. The role of N-linked glycosylation in HIV As part of the post-translational processing of a HIV virion carbohydrates are added to the surface of the virion by the hosts glycosylation mechanism. The binding of such N-linked glycans conveys protection to a virions surface proteins by acting as a shield to avoid detection by the host's immune system. These carbohydrates, however, may comprise a novel target for HIV therapeutics and Natasha Wood (postdoctoral researcher) is currently studying the three-dimensional properties of this 'glycan shield' to further understand its therapeutic potential. Collaborators: Prof Robert Woods, NUI Galway, Ireland Dr Elisa Fadda, NUI Galway, Ireland. Dr Simon Lovell, University of Manchester, UK Development of 454 analysis pipelines Ram Krishna Shrestha (PhD student) is working on a project focused on the management and analysis of HIV-1 ultra-deep sequencing data. He is developing methods to process and analyse HIV sequence data generated using 454 sequencing technology for the detection of drug resistance. His project also focuses on analysis of 454 sequence data from individuals infected with HIV-1 subtype C. This data is being used to examine the effect of minor variant drug resistant viruses on the treatment outcome of individuals infected with HIV-1 subtype C. Collaborators: Dr Grace McCormack (NUI Galway, Ireland) The Karonga Prevention Study (Malawi) Prof Maria Papathanasopoulos (Wits Medical School) Outputs: A talk at the launch of the NICD 454 sequencing platform. A novel method for the quality control of 454 sequence data (paper submitted to Bioinformatics). The role and mechanisms of CXCR4-usage in HIV-1 subtype C. Saleema Crous (MSc student) is working on coreceptor usage in HIV-1 subtype C. Transmitted viruses mostly use the CCR5 chemokine receptor as a coreceptor for host cell entry. Using sequence data from viruses whose

39

40

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Research Laboratories cont. coreceptor preference is known this project aims to better understand the molecular mechanisms that lead to the change of coreceptor usage in HIV-1 subtype C. Outputs: Identified the optimal method for the prediction of CXCR4-usage in subtype C sequences (paper submitted). Molecular Phylodynamics of HIV focusing on the transmission of HIV drug resistance Using data collected from a number of HIV cohorts Fredrick Nindo (MSc student) is using molecular phylodynamic approaches to better understand the epidemiology of these cohorts. Combining sequence data with epidemiological information Fred is using phylogenetic approaches to model the extent of transmission networks of HIV within these populations. These clusters will be further correlated with the presence of drug resistant mutations to describe the levels of transmission of resistant virus between individuals. Collaborators: Dr Grace McCormack (NUI Galway, Ireland) The Karonga Prevention Study (Malawi)

A n n u a l R e p o rt 2 0 1 1

Research Collaborations Genetic factors underlying systemic lupus erythematosus (SLE) in South African patients. Nicki Tiffin collaborating with: Dr Ikechi Okpechi Dr Asgar Kalla

Department of Nephrology and Hypertension, UCT/Groote Schuur Hospital

Dr Ayanda Gcelu Nature and purpose: This collaboration has established a patient registry and database for SLE patients in Cape Town, for the purpose of clinical and genetic research into this disease. Patients are recruited to the registry on an ongoing basis from the clinics at Groote Schuur Hospital, patient DNA is biobanked for future research, and biochemical and clinical data are collected and databased at each patient visit. Output in the last 12 months: Research outputs for 2011 have included two papers submitted to peer-reviewed journals in November 2011. These are “Clinicopathological insights into lupus nephritis in South Africans: a study of 251 patients”, and “A diverse array of genetic, cellular and environmental factors converge in the pathogenesis of Systemic Lupus Erythematosus”. Future direction: We will be conducting a thorough analysis of the clinical and biochemical features of the patients who have been recruited to the patient registry thus far (110 patients, Feb 2012), and are constructing our relational database for effective data entry, storage and querying for patient data. We are actively seeking funding to undertake research into the genetics and molecular processes underlying lupus in these patients; and will also be performing computational analyses to predict candidate disease genes for lupus. Genetic factors underlying salt-sensitive hypertension in South African patients Nicki Tiffin collaborating with: Prof Brian Rayner

Department of Nephrology and Hypertension, UCT/Groote Schuur Hospital

Mr C. J. Van Heerden

Central Analytical Facilities, DNA Sequencing Unit, Stellenbosch University

Nature and purpose: This collaboration aims to elucidate the genetic factors underlying salt-sensitive hypertension in African patients. Output in the last 12 months: In 2011 we selected a group of patients with salt-sensitive hypertension, a group of patients with essential hypertension (not salt-sensitive), and a group of normotensive controls. In all about 300 samples have been submitted for next-generation sequencing approaches to identify variations in a primary candidate gene. Future direction: The data generated from the sequencing analysis will be used to identify variants in the PTH gene that might contribute to salt-sensitive hypertension in Cape Town patients. Host genetics underlying response to Leishmania major. Nicki Tiffin collaborating with: The Sysco Consortium

For the full members list see http://asahttp.drim.com/syscoproject/

Drs Frank Brombacher and Immunology and Infectious Diseases, ICGEB/UCT Anita Schwegmann

41

42

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Research Collaborations cont. Nature and purpose: This collaboration, funded by the European Union 6th framework, aims to elucidate the genetic factors and regulatory pathways that underlie the response of host macrophages to infection with L.major. Output in the last 12 months: Tiffin, N., Hofmann, O., The SysCo Consortium, Schwegmann, A., Brombacher, F., Hide, W. Analysis of differential gene expression and regulatory networks in Leishmania-infected macrophages from susceptible and resistant mouse strains. Poster: ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa 2011, Future direction: This project is completed and preparation of findings for publication is ongoing. Susceptibility to Mycobacterium tuberculosis infection in HIV-negative patients Nicki Tiffin collaborating with: Prof Eileen Hoal Van Helden

Biomedical Sciences, Stellenbosch University

Nature and purpose: This collaboration involves the computational analysis and databasing of genotyping data generated from M Tb infected patients and matched controls. Output in the last 12 months: Roetz, N., Möller, M. Tiffin, N., Christoffels A., Hoal, E. Poster: Detecting Copy Number Variants in Host Susceptibility to Tuberculosis using SNP Array. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa 2011 Genomic and proteomic determinants of Mycobacterium tuberculosis phenotypic characteristics Alan Christoffels and Junaid Gamieldien collaborating with: Prof NC Gey van Pittius Prof Rob Warren

Stellenbosh University

Nature and purpose: An analysis of the genomic, transcriptomic and proteomic variations giving rise to phenotypic characteristics in strains of Mycobacterium tuberculosis which enhance its ability to survive within its host and evade the host’s defense mechanisms. Output in the last 12 months: High throughput sequencing data for 8 TB genomes were received and assembled at SANBI together with preliminary annotations. PhD student, Alecia Naidu presented her computational pipeline at the international Bioinformatics Conference in Cape Town in March 2011. MSc student, Mmakamohelo Direko submitted her thesis in December 2011 where she assembled the M.oryx genome and identified SNPs that will be validated in 2012. Developed a web-resource to access the TB data. (www.sanbi.ac.za/tb_genomics) Future direction: The annotated TB genomes will be analysed during 2012 to validate the nucleotide variations and rearrangements in the TB genome and submitted for publication. Apply our protocols to a large number of newly sequenced TB strains. Training of students in the laboratory of our collaborator at Tygerberg Medical School.

A n n u a l R e p o rt 2 0 1 1

Research Collaborations cont. Development of an integration knowledge system for complex data Junaid Gamieldien collaborating with: Dr Veronique Vaslin Prof David Klatzman

Immunologie-Immunopathologie-Immunothérapie research institute, Paris

Dr Adrien Six Nature and purpose: The immunology institute in Paris has generated immunomics-type experimental data on a large scale but do not have adequate methods to store and manage this information. The SANBI/MRC unit will develop an integration database system that will be tailored to immunology-type datasets Output in the last 12 months: Veronique Vaslin visited SANBI to explore the details of the collaboration Junaid Gamieldien developed a prototype of the integration system that will be used for the French collaboration. Junaid presented his prototype in Paris during 2011 at the immunology institute. Future direction: Further development of the data integration system to support the immunology-type datasets. A shared PhD student is being recruited for 2012. International Glossina Genome Initiative (IGGI) Consortium Alan Christoffels collaborating with IGGI Consortium members, including: Serap Aksoy

Yale University, US

Dan Masiga

International Centre for Insect Physiology and Entymology, Kenya

Matt Berriman

Sanger Institute, UK

Loyce Okedi

National Livestock Health Research Institute, Tororo, Uganda

Mike Lehane

Liverpool School of Tropical Medicine, Liverpool, UK

Nature and purpose: To sequence the Glossina morsitans genome in order to rapidly provide an evaluation of the translational impact on eradication of the vector of sleeping sickness in Africa. (Present on all borders of South Africa.) Output in the last 12 months: Sequencing and assembly of the Glossina morsitans genome in October 2011. Multiple working groups in the cosortium assigned specific sections to write for the genome paper. Future direction: Analysing expression data to be compiled as satellite papers after the genome paper is published. Hosting a workshop in Kenya for genome annotation. RNA Secondary Structure Prediction Gordon Harkins collaborating with: Y. Semegni D. Martin

Institute of Infectious Diseases and Molecular Medicine, UCT

A. Varsani

School of Biological Science, University of Canterbury, New Zealand

M. Wamalwa

BecA-ILRI hub Nairobi, Kenya

43

44

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Research Collaborations cont. Nature and purpose: While much is known about regulatory motifs in RNA at the 5’ and 3’ untranslated regions, most potential regulatory elements within RNA viral genomes likely remain uncharacterised. These regulatory motifs constitute an important component of the genetic code and as such indicate that much remains to be discovered by the analyses of singe stranded RNA/DNA viral genomes and intact messenger RNAs (mRNAs). Therefore, an efficient and accurate structure prediction methodology can give vital directions to experimental studies aiming to evaluate the function of these conserved secondary structure architectures. We have devised such a tool called NASP (Nucleic Acid Structure Prediction) that identifies evolutionarily conserved nucleic acid secondary structures sequences (Semegni et al 2011), that takes as input a nucleotide sequence alignment and returns the most probable evolutionarily conserved consensus secondary structure. NASP provides statistical support for the folding predictions and the overall presence of secondary structure. By combining these predictions with co-variation, recombination and synonymous substitution rate analysis, conserved RNA secondary structures can be reliably identified. This allows us to test for a) evidence of purifying selection pressures acting upon synonymous sites within protein coding regions b) evidence that sites predicted to be paired within secondary structures are co-evolving and c) evidence that recombination that naturally occurs among virus genomes has tended to preserve the secondary structures more than would be expected if the observed recombination events were randomly distributed throughout the genome. Output in the last 12 months: Downloadable and web-based versions of the software programme Nucleic Acid Structure Prediction (NASP) are freely available at http://web.cbio.uct.ac.za/~yves/nasp_portal.php http://web.cbio.uct.ac.za/~yves/nasp_ portal.php A single publication: NASP: A Parallel Program for Identifying Evolutionarily Conserved Nucleic Acid Secondary Structures from Sequence Alignments Semegni et al. 2011 Bioinformatics 27: 2443-2445. Future direction: A master’s student is currently investigation the evolutionary conserved nucleic acid secondary structures identified in the positive-sense single-stranded RNA viral families Picornaviridae and Caliciviridae. Specifically, he is looking for a) evidence of purifying selection pressures acting upon synonymous sites within protein coding regions b) evidence that sites predicted to be paired within secondary structures are co-evolving and c) evidence that recombination that naturally occurs among virus genomes has tended to preserve the secondary structures more than would be expected if the observed recombination events were randomly distributed throughout the genome. Characterizing HIV-1 subtype C gp160 envelope sequence diversity in the female genital tract and plasma during acute and chronic infection Gordon Harkins collaborating with: L. Morris P. Moore

National Institute of Communicable Disease, Pretoria

B. Lambson M. Abrahams D. Martin

Institute of Infectious Diseases and Molecular Medicine, UCT

G. Bandawe P. Lemey

Department of Microbiology and Immunology, Leuven University, Belgium

S. Karim

CAPRISA, University of KwaZulu Natal, Durban

A n n u a l R e p o rt 2 0 1 1

Research Collaborations cont. Nature and purpose: We are investigating whether differences exist between human immunodeficiency viruses in the female genital tract and blood plasma through a longitudinal study of the sequence diversity in HIV-1 infected patients during chronic and acute infection. To date we have acquired HIV-1 envelope sequence data from sampling and sequencing work conducted by CAPRISA and the National Institute for Communicable Diseases (NICD). The data comprises a cohort of four HIV-1 positive females that have not been exposed to antiretroviral treatment (ART) throughout their participation in this study, with a total of 449 samples collected at time intervals ranging from 14 to 1316 days post sero-conversion. Using a hierarchical Bayesian statistical inference approach we have estimated the number of variants that each patient was infected with and evaluated the degree of viral compartmentalisation of HIV-1 subtype C viruses within the female genital tract and blood plasma. An improved understanding of viral evolution within the female genital tract during acute and chronic infection should contribute to the development of more effective treatments and prevention strategies to block or reduce heterosexual and perinatal transmission of HIV. Output in the last twelve months: A single paper is in preparation for submission to a peer-reviewed journal. Future direction: A comparative analysis of structured coalescent and hierarchical phylogenetic viral diffusion models is currently being performed and a single paper and at least three conference presentations are expected to result from this study. Geminivirus Collaborative Network Gordon Harkins collaborating with: D. Martin

Institute of Infectious Diseases and Molecular Medicine, UCT

D. Shepherd

Department of Molecular and Cell Biology, UCT

J. Khan

Department of Crop Sciences, Sultan Qaboos University, Oman

A. Varsani

Department of Biological Sciences, University of Canterbury, New Zealand

J. Brown

Plant Pathology Department, University of Arizona, USA

P. Lemey

Department of Microbiology and Immunology, Leuven University, Belgium

P. Rougmangac

CIRAD, Montpellier, France

J. Lett P. Lefeuvre

CIRAD-Universite´ de la Re´union, Isle de la Re´union

Nature and purpose: Pervasive food insecurity is a major determinant of health in sub-Saharan African countries where life-expectancy rates remain among the lowest in the world and where malnutrition ranks among the greatest causes of ill health. While the causes for this situation are undoubtedly multifactorial, crop losses due to geminiviral disease remain high on the African continent seriously undermining both the food and economic security of the over 300 million sub-Saharan Africans that are dependent of subsistence farming. The SANBI viral pathogen genetics team forms part of a pan-African network of crop scientists and virologists that are conducting the world’s largest ongoing plant pathogenic virus diversity studies. The primary focus of this collaborative group is the comparative analysis of cassava mosaic virus (CMV), tomato leaf curl virus (TYLCV) and maize streak virus (MSV) disease transmission dynamics throughout Africa and the world. We are conducting cutting-edge epidemiological research and that has identified the predominant CMV, TYLCV and MSV genotypes that will confront resistant transgenic cultivars in different parts of the continent and the world and has determined the historical migration pathways, movement rates and heterogeneity in spatiotemporal spread across Africa and the world of these viruses. The virus movement rate estimates and the movement pathways identified in this

45

46

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Research Collaborations cont. study will be important parameters in future disease forecasting efforts that could directly benefit hundreds of millions of small scale farmers throughout Africa. Output in the last 12 months: 2 publications The Time-scale of Begomovirus Evolution: Evidence from Integrated Sequences in the Nicotiana genome Lefeuvre, et al. 2011 PLoS ONE 6(5): 2011. e19193. doi:10.1371/journal.pone.0019193. Reconstructing the History of Maize Streak Virus Strain-A Dispersal to Reveal Diversification Hotspots and its Initial Origins in Southern Africa Gordon Harkins collaborating with (et al): D. Martin

Institute of Infectious Diseases and Molecular Medicine, UCT

A. Varsani

School of Biological Sciences, University of Canterbury, Christchurch, New Zealand

P. Lefeuvre

CIRAD-Universite´ de la Re´union, Isle de la Re´union

P. Lemey

Department of Microbiology and Immunology, Leuven University, Belgium

Nature and purpose: The A strain of maize streak virus (MSV-A) seriously threatens food security in sub-Saharan Africa. We have generated whole genome MSV-A sequences and, in combination with sampling coordinates and dates, inferred that the virus originated in southern Africa around 1850 and spread across the continent at a rate of approximately 30 km per year. Strikingly, all major contemporary MSV-A lineages arose within the past 50 years from just two well-defined diversification hotspots in south and east Africa. This discovery could dramatically simplify future efforts to monitor the emergence of epidemiologically relevant MSV-A variants. Output in the last 12 months: Monjane et al. 2011 The Journal of Virology, September, Vol. 85, No. 18 p9623-9636. Journal cover and spotlight section of The Journal of Virology. Future direction: Gaining an improved understanding how the virulence of these crop pathogens has changed since their initial emergence as serious agricultural pests in the 17th and 18th centuries and since the widespread growth of resistant (both conventionally bred and transgenic) improved crop genotypes. ENSEMBL fungal computational framework Alan Christoffels collaborating with: Jasper Rees

Agricultural Research Council

Dan Lawson

European Bioinformatics Institute, UK

Nature and purpose: Next generation sequencing technology was used to sequence the fugus, Venturia inaequalis, that infects apples. Together with data from the host, this large-scale data can provide insight into the genetic basis for the fungal interaction with the apple plant. Using the ENSEMBL opensource computational framework, we are developing a system for storing and mining genomic data generated in South Africa. Output in the last 12 months: We have implemented a local version of ENSEMBL at SANBI and have carried out an initial assembly of the Venturia genome. The Venturia genome was preliminary annotated using a newly implemented method.

A n n u a l R e p o rt 2 0 1 1

Research Collaborations cont. One of our MSc students has spent one month at the EBI in Cambridge as part of the skills transfer to SANBI. Future direction: The fungal annotations will be integrated with existing fungal genome data in ENSEMBL. Development of a method to filter next generation sequencing data for improving genome and transcriptome assembly. Computational discovery of carotenoid pathway regulatory networks Alan Christoffels collaborating with: E. Wurtzel

Lehman College, City University of New York

Nature and purpose: Vitamin A deficiency is associated with the consumption of food crops that are poor sources of provitamin A. However, the incomplete understanding of the regulatory pathway at the systems level, is a limiting factor to predictably control carotenoid content and composition in cultivars grown around the world. In the project we aim to discover transcriptional regulatory mechanisms controlling plant carotenogenesis. Output in the last 12 months: Produced an inventory of genes responding to environmental stress and implicated in the carotenogenesis pathway. Identified co-regulated carotenogenesis genes. Future direction: Prepare a manuscript for publication. Understanding the molecular mechanisms behind resistance to CCR5 antagonists. Simon Travers collaborating with: David Robertson Simon Lovell Grace McCormack

University of Manchester, UK National University of Ireland Galway, Ireland

Pfizer Global Research and Development, Sandwich, Kent, UK Nature and purpose: Use data from the Phase III clinical trials of Pfizer’s CCR5-antagonist maraviroc to understand the viral mechanisms of resistance to CCR5-antagonists. Outputs in the last 12 months: Two completed manuscripts which are currently on hold by Pfizer for confidentiality reasons. Graduation of a PhD student (Conor Meehan, NUI Galway). Future direction: Continue analysing the clinical trials data and to use subsequent sequence data obtained from Pfizer to develop sensitive genotypic methods to predict the coreceptor usage of an individual’s viral population with the intention of predicting the potential for resistance prior to CCR5 antagonist therapy initiation. HIV genotypic analysis as part of the Karonga Prevention Study (KPS) Malawi. Simon Travers collaborating with: Grace McCormack

National University of Ireland Galway, Ireland

The London School of Hygiene and Tropical Medicine, UK KPS, Chilumba, Malawi

47

48

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Research Collaborations cont. Nature and purpose: Characterisation and molecular epidemiology of HIV in Karonga District in Northern Malawi. Outputs in the last 12 months: Paper published in AIDS Research and Human Reroviruses documenting the prevalence of viral drug resistant mutations in treatment nieve individuals. Paper published in AIDS Research and Human Retroviruses studying the viral factors associated with long-term survival of HIV infected individuals. Successful implementation of 454 ultra-deep sequencing of 15 samples from 5 individuals documenting the prevalence and emergence of minor variant drug resistant virions (Manuscript in prep). Future direction: Further studies of the prevalence and emergence of drug resistance in individuals in response to antiretroviral therapy. Characterisation of novel recombination strains and study of the viral factors for long term survival in a number of individuals identified in the cohort. Usage of sequence data and biological samples to characterise the emergence of CXCR4-usage in subtype C infected individuals. Use sequence and geographical data to study transmission networks in the cohort. Understanding HIV-1 drug resistance using 454 ultra-deep pyrosequencing Simon Travers collaborating with: Gert van Zyl

NHLS Tygerberg and Stellenbosch University

Nature and purpose: Using 454 sequencing to characterise low abundance drug resistant viral variants in individuals failing first and second line therapy. Outputs in the last 12 months: Two manuscripts. One currently under second review in Journal of Virology and another for submission to Journal of Virology in early 2012. Future direction: Continue to study ultra-deep sequence data and HIV drug resistance. Towards cost-effective HIV drug resistance testing. Simon Travers collaborating with: Prof Wendy Stevens Dr Gillian Hunt Dr Leigh Berrie Prof Maria Papathanasopoulos

Head, Department of Molecular Medicine and Haematology, National Priority Program Centre for HIV / STI, National Institute for Communicable Diseases coDirector, Genotyping Laboratory, Department of Molecular Medicine and Haematology

Nature and purpose: Explore the use of 454 sequencing to develop a cost-effective, high-throughput approach for HIV drug resistance testing. Outputs in the last 12 months: 454 sequencing performed from 642 samples from patients with known treatment outcome. Computational pipeline developed for sequence data management and analysis. Manuscript in reparation for submission to New England Journal of Medicine.

A n n u a l R e p o rt 2 0 1 1

Research Collaborations cont. Future direction: Use the data acquired in the first phase of the project to determine the clinical relevance of using 454 sequencing for HIV drug resistance testing. Based on the above results implement a high-throughput pipeline from blood sample to result. Host-pathogen interactions in Sleeping sickness Alan Christoffels collaborating with: Prof Mark Field

Cambridge University, UK

Prof Henry Nyongesa

Computer Science Department, UWC

Nature and purpose: The flagellar pocket of T.brucei represents a location where trypanosomes and human immune proteins interact. Furthermore, trypanosome-human protein complexes are ingested at the flagellar pocket and are trafficked via an elaborate transport system to vesicles where they are degraded. The limited human-trypanosome protein interaction data led us to use machine-learning approaches to identify key protein interactions in collaboration with Prof Nyongesa. Proteins such as Rab have been extensively studied in the trafficking process through the trypanosome unlike SNARES. Besides the PPI predictions, we computationally identify the spectrum of SNARES in trypanosomes coupled with cell localization assays to confirm the cellular location of these SNARES in collaboration with Prof Mark Field. Outputs in the last 12 months: A PhD student, Edwin Murungi computationally identified 24 SNARES in T. brucei. He then spent two months in Cambridge carrying out cell localization assays on 4 SNARE proteins predicted for the typanosome SNARE repertoire. Using a dataset of Trypanosome flagella proteins and human immune proteins, Edwin predicted human-trypanosome protein interaction networks. Future direction: Two manuscripts are being prepared for publication. Other machine learning techniques will be assessed to improve algorithm performance in the prediction of protein-protein interactions. Characterisation of miRNAs in A.funestus Alan Christoffels collaborating with: Prof Lizette Koekemoer

National Institute of Communicable Diseases

Nature and purpose: miRNAs have been shown to place a regulatory role in fine-tuning gene expression. Majority of mosquito miRNAs were identified in A. gambaie while no miRNAs have been identified in A. funestus, an important vector on the African continent. Using next generation sequencing technology, we aim to identify miRNAs in A. funestus and predict the miRNA targets. Outputs in the last 12 months: Small RNAs were isolated from A. funestus and sequenced using illumina technology. These small RNAs were screened computationally for miRNAs and classified into various categories. miRNA targets were identified using three algorithms and filtered using gene enrichment analysis. Data have been shared via a web portal (insectar.sanbi.ac.za) Future direction: A number of insect genomes have been sequenced in the past 2 years and include a few blood-feeding vectors. We will compare miRNAs in these disease vectors to identify key mechanisms for parasitic control.

49

50

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Research Collaborations cont. Human Genetic Susceptibility to Tuberculosis Alan Christoffels collaborating with: Eileen Hoal van Helden

Stellenbosch University

Peter Witbooi

Mathematics Department, UWC

Nature and purpose: Intra-species protein-protein interaction predictions (PPI) have been attempted for a range of organisms even through the false positive rate remains high. In this project we are attempting to use supervised and unsupervised algorithms to predict the interactions between human and mycobacterium proteins. Outputs in the last 12 months: The limited experimental PPI for human-mycobacteria interactions has shifted our strategy to the application of Bayesian techniques to measure the interactions between human and M. tuberculosis Future direction: With new experimental data on the horizon, we will be exploring clustering techniques to enrich for human-mycobacterium PPI.

A n n u a l R e p o rt 2 0 1 1

Financials

The total funding secured at SANBI was R10 380 585.91 for the year 2011. 63% of SANBI funding was secured from South African2011 agencies, 24% from UWC and 13% from foreign donors. The South African ANBI ANNUAL SANBI ANNUAL REPORT REPORT 2011 SANBI ANNUAL REPORT Research Chair programme provided2011 the largest portion at 41%, followed by UWC 28%, SA MRC 18% and other NRF projects totaled 13%.

NANCIALS FINANCIALS FINANCIALS Oftotal the total income, salaries accounted for R10 42%, bursaries for with 21% running costs e totalThe funding funding secured secured at SANBI at was SANBI R10was 380 585.91 380 for 585.91 theaccounted year for the 2011 year 2011 aand consistent with a consistent 77%accounted 77% The funding secured R10 380 forfrom theAfrican year 2011 withagencies, a24% consistent erageaverage expense expense tototal income. to income. 63% of SANBI 63%atofSANBI funding SANBIwas funding was secured was 585.91 secured from South South agencies, African 24%77% for 37%. average expense to income. 63% of SANBI funding was secured from South African agencies, m UWC from and UWC 13%and from 13% foreign from donors. foreign donors. The South TheAfrican South Research African Research Chair programme Chair programme providedprovided the the 24% from UWC and 13% from foreign donors. The South African Research Chair programme provided the gest portion largest at portion 41%, followed 41%,funding followed by UWC by 28%, UWC MRC 28%, 18% MRC and18% other and NRF other projects NRF projects totaled 13%. totaled 13%. A number of at exciting applications were submitted during 2011 that included (1) NIH call for funding largest portion at 41%, followed by UWC 28%, MRC 18% and other NRF projects totaled 13%. on human, hereditary and health, (2) a national application to the Department of Science and Technology,

O WE DO WANT WE TO WANT TO ANYTHING ADD ANYTHING THE CHAIR THE RENEWAL CHAIR OTHER ANDapplication OTHER GRANTfor GRANT SA for theADD establishment of a ABOUT SouthernABOUT African human genomeRENEWAL andAND (3) Renewal the DST/NRF DO WE WANT TO ADD ANYTHING ABOUT RENEWAL PPLICATIONS APPLICATIONS THAT WERE THAT COMPILED WERE COMPILED IN 2011 IN Ð AP, 2011 NIH/H3 Ð THE AP, CHAIR NIH/H3 AFRICA AFRICA , , AND OTHER GRANT Research Chair in Bioinformatics and Public Health Genomics. APPLICATIONS THAT WERE COMPILED IN 2011 Ð AP, NIH/H3 AFRICA , Income and Expenditure trends 2004 – 2011:

comeIncome and Expenditure and Expenditure trends 2004 trends Ð 2011: 2004 Ð 2011: Income and Expenditure trends 2004 Ð 2011:

Distribution income from all sources: Distribution income from SA sources: stribution Distribution of income ofof income from allfrom sources: all sources: Distribution Distribution ofofincome of income from SAfrom sources: SA sources: Distribution of income from all sources: Distribution of income from SA sources:

the total Of the income, total income, salariessalaries accounted accounted for 42%,for bursaries 42%, bursaries accounted accounted for 21%for and21% running and running costs costs Of the counted accounted for 37%. for total 37%.income, salaries accounted for 42%, bursaries accounted for 21% and running costs accounted for 37%.

tailedDetailed expense expense report for report SANBI: for SANBI: Detailed expense report for SANBI:

nder Funder Salaries Salaries PostdocsPostdocs DoctoralsDoctorals Masters Printing MastersRunning PrintingInternet RunningTelecoms Internet Telecoms Travel Overheads Travel Overheads Equipment Equipment Total Total Funder Salaries Postdocs Doctorals Masters Printing Running Internet Telecoms Travel Overheads Equipment Total

51

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e 52

Financials cont.

Salaries

Total

Travel

108,781

Overheads Equipment

Postdocs Doctorals Masters Printing Running Internet Telecoms

71,916 1,648,432

146,618

97,683

17,005

5,970

26,275

194,123

609,163

5,576

10,423

53,978

31,701

125,500

12,466

89,342

71,713 3,220,427

15,851

1,001,150

66,732

2,995

650

18,916

15,056

4,676

54,000

312

10,638

101,918

2,349

427,336

51,188

1,778,708 9,101,524

342,104

1,595,690 1,595,690

317,784

468,488

51,128

68,704

40,000

30,830

34,984

9,080

90,811

757,272

91,279

90,967

295,000

730

72,241

50,000

398,978

157,256

81,602

565,000

864,500

80,000

490,000

160,000

680,830

5,970

66,082

2011 Detailed expense report for SANBI (ZAR): Funder 1,282,965

187,863

SA MRC Atlantic Philanthropies

-

3,826,002

200,165

1,153,187

27,260

974,563

World Health Organisation Dean’s Budget NRF Thuthuka NRF Blue Skies NRF Research Chair NRF Vitamin A NRF ENSEMBL DVC Capital Centre for Diseases Control TOTAL

A n n u a l R e p o rt 2 0 1 1

SANBI 2011 End-Of-Year-Party At the end of yet another productive year, SANBI staff and students spent an enjoyable day at Ratanga Junction. The day was filled with team-building activities and a lunch was enjoyed by all. A range of awards were handed out at this event and everyone looked forward to a well-deserved break.

53

54

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

ALUMNI Staff: Name

Institution

Winston Hide

Associate Fellow Ludwig Institute for Cancer Research Affiliate Faculty Harvard Stem Cell Institute Associate Professor of Computational Biology and Bioinformatics Department of Biostatistics Harvard School of Public Health

Vladimir Bajic

Director & Professor: Computational Bioscience Research Center, King Abdullah University of Science and Technology

Heikki Lehvaslaiho

Senior Research Scientist: Computational Bioscience Research Centre, King Abdullah University of Science and Technology

Tulio de Oliviera

Senior Bioinformatics Researcher: Africa Centre for Health and Population Studies, University of KwaZulu-Natal

Nicky Mulder

Group Head: Computational Biology Group, University of Cape Town

Cathal Seoighe

Stokes Professor of Bi oinformati cs: School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway

Postdoctoral Fellows: Name

Level of study

Date completed

Currently

Soraya Bardien-Kruger

PostDoc

Jun-02

University of Stellenbosch

Vladimir Babenko

PostDoc

Jun-02

Senior Staff Scientist, IC&G

Janet Kelso

PostDoc

Oct-04

Max Planck Institute for Evolutionary Anthropology

Raphael Isokpehi

PostDoc

Dec-04

Director of the Center for Bioinformatics & Computational Biology at Jackson State University

Konrad Scheffler

PostDoc

Feb-05

Theodore Gildred Research Facility, University of California, San Diego

Nicki Tiffin

PostDoc

Dec-05

Senior Lecturer, SANBI, UWC

Gwen Koning

PostDoc

Dec-06

Global Seed Core Manager – Syngenta Crop Protein AG, Basel, Switzerland

Chris Maher

PostDoc

Dec-07

Assistant Professor, Washington University School of Medicine

James Patterson

PostDoc

Jun-09

Adam Dawe

PostDoc

Aug-09

Research Scientist, KAUST

Sunil Sagar

PostDoc

Aug-09

Research Scientist, KAUST

Mandeep Kaur

PostDoc

Aug-09

Research Scientist, KAUST

Stuart Meier

PostDoc

Aug-09

Research Scientist, KAUST

Adele Kruger

PostDoc

Feb-10

Wayne State University, Detroit, Michigan

Oliver Hofmann

PostDoc

Feb-10

Affiliated Faculty, Harvard Stemcell Institute, Associate Director at Harvard School of Public Health

Sundarajan Seshadri

PostDoc

Nov-10

Nanyang Technological University, Singapore

Ashley Pretorius

PostDoc

Dec-10

Senior Lecturer, Biotechnology, UWC

Jacob Tsotetsi

PostDoc

Dec-11

PhD: Name

Level of study

Date completed

Currently

Alan Christoffels

PhD

2001

Interim Director, SANBI, UWC; NRF Research Chairholder

Ekow Oppon

PhD

2002

SA MRC

Junaid Gamieldien

PhD

2002

Senior Lecturer, SANBI, UWC

A n n u a l R e p o rt 2 0 1 1

ALUMNI cont. Zhuo Zhang

PhD

2007

Research Scientist, University of Singapore

Allen Chong

PhD

2009

Research Fellow, Beth Israel Deaconess Medical Center, Harvard Medical School

Magbubah Essack

PhD

Sep-09

Research Scientist, KAUST

Sebastian Schmeier

PhD

Sep-09

Research Scientist, KAUST

Ulf Schaefer

PhD

Sep-09

Research Scientist, KAUST

Mark Wamalwa

PhD

Sep-11

International Livestock Research Institute, Kenya

Musa Gabere

PhD

Sep-11

University of Namibia, Mathematics Department

Samuel Kwofie

PhD

Sep-11

UWC Postgraduate Office

2010

Research Scientist, KAUST

Aleksander Radovanovic PhD

MSc: Name

Level of study

Date completed Currently

Bukiwe Lupindo

MSc

2005

SA Government Administration

Cameron MacPherson

PhD

2009

PhD, KAUST

Tzu-Ming Chern

MSc

Mar-03

Switzerland, IT

Elana Ernstoff

MSc

Dec-03

Estienne Swart

MSc

Dec-03

Graduate Student, Princeton University

Victoria Nembaware

MSc

Dec-03

Post doc, UCT

Zayed Albertyn

MSc

Dec-03

Bioinformatics Director, Malaysia

Anelda Boardman

MSc

Mar-04

Stellenbosch University, Sequencing Facility Manager

Faisel Mosoval

MSc

Mar-05

Senior Professional Officer, Information Systems and Technology, Business Applications, Business Intelligence and Spatial Development, City of Cape Town

Nothemba Gwija-Kula

MSc

Mar-05

MRC

Farahnaz Ketwaroo

MSc

Dec-05

PhD, UCT

Mario Jonas

MSc

Mar-06

Web Administrator, SANBI, UWC

Oliver Bezuidt

MSc

Dec-07

PhD, University of Pretoria

Eugene Duvenhage

MSc

Mar-09

Software Developer, Corporate

Frederick Kamanu

MSc

Mar-09

PhD, KAUST

Feziwe Mpondo

MSc

Sep-09

MRC, Research Scientist

Saleem Adam

MSc

Sep-11

Firdous Khan

MSc

Mar-12

PhD, UWC Biotechnology Department

Honours: Name

Level of study

Date completed

Clifford Omorogie

Hons

Dec-01

Grant Carelse

Hons

Dec-02

Thurayah Davids

Hons

Dec-05

Halimit Ebrahim

Hons

Dec-09

Katlego Motlhatlego

Hons

Mar-12

Siyanda Tsaba

Hons

Mar-12

Stacey Moses

Hons

Mar-12

Currently

MSc, UWC Biotechnology Department

MSc, UWC Biotechnology Department

55

56

S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e

Notes

SANBI | South African National Bioinformatics Institute

Postal Address: South African National Bioinformatics Institute | University of the Western Cape | Private Bag X17 | Bellville | 7535 Physical Address: South African National Bioinformatics Institute | 5th Floor | Life Sciences Building | University of the Western Cape Modderdam Road | Bellville | 7530 | South Africa Telephone: +27 (0)21 959-3645 | Facsimile: +27 (0)21 959-2512 | Mailing List: [email protected]

Email: [email protected] |

Website: www.sanbi.ac.za

Suggest Documents