South African National Bioinformatics Institute
Research Institute at the University of the Western Cape since 1997
South African Medical Research Council Bioinformatics Capacity Development Unit since 2002
World Health Organisation Tropical Disease Research regional training center since 2004
Department of Science and Technology National Research Foundation Research Chair in Bioinformatics and Public Health Genomics since 2007
Contents Policy Mandates
2
Vision, Mission, Goals
3
Director’s Message
4
2011 Overview
5
Staff
7
Capacity Development Undergraduate Training Programme Postgraduate Training Programme SANBI Graduations Conferences, Workshops and Courses organised by SANBI Internships Conference Participation
9 9 10 12 13 16 17
Computational Resources
19
Awards and Honours
20
SANBI in the Media
21
Community Engagement
22
Research Outputs Summary Journal Publications Chapters in Books Software and Similar Outputs Developed Keynote and Invited plenary Conference Presentations or Posters Theses Expert Panel or Professional Membership Policy Briefs Intervention Programmes
23 23 23 24 24 25 25 27 28 28 28
Research Projects Overview
29
Research Labs Alan Christoffels Junaid Gamieldien Gordon Harkins Nicki Tiffin Simon Travers
30 30 33 34 37 39
Research Collaborations
41
Financials
51
End of Year Party
53
Alumni
54
2
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
POLICY MANDATES National Strategic Plan for HIV/AIDS, STIs and TB, (2012 - 2016) The vision and mission of SANBI align with Draft Zero of the National Strategic Plan (NSP). This draft specifies Research and Innovation as a key enabler of the NSP, and proposes that “relevant research provides information and the impetus for innovation within the implementation of the NSP”, and that strategic priorities should include “concrete plans to improve capacity for Research” and “a budget for research”. The Department of Science and Technology’s Ten Year Innovation Plan (2008 - 2018) One of the five Grand Challenge areas specified in this Plan is the “Farmer to Pharma” value chain to strengthen the bioeconomy. SANBI’s genomics programme, which straddles both communicable and non-communicable diseases, aligns clearly with this Grand Challenge. The MRC Act (Act 58 of 1991) As an extramural unit of the MRC, SANBI falls under the legislative and other mandates of the MRC. In Section 3, this Act states that the Legislative Mandate of the MRC is: "Through research, development and technology transfer, to promote the improvement of the health and quality of life of the population of the Republic, and to perform such functions as may be assigned to the MRC or under this Act."
A n n u a l R e p o rt 2 0 1 1
Vision To become a centre of global, African and South African excellence, achieving the highest levels in biomedical research and education.
Mission To conduct cutting edge bioinformatics and computational biology research relevant to South African, African and global populations. To develop human resources in bioinformatics and computational biology by educating and mentoring scientists. To increase awareness of and access to bioinformatics and computational biology resources.
Goals To generate and publish high quality, relevant biomedical research. To train and graduate competent and productive researchers. To add value to the academic program of the University of the Western Cape. To enhance other research fields through collaborative projects. To establish sources of renewable funding to pursue the mission of the institute.
3
4
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Director's Message It is with much excitement that I can reflect on 2011 in the context of bioinformatics development both locally and across the African region. The front cover of our annual report each year captures a significant event pivotal to our yearly activities. The large cohort of MSc and PhD graduates (6) in 2011 has been a proud moment for the institute as we renew our commitment to develop bioinformatics capacity in South Africa and the African continent. Our recent graduates have either continued with PhD studies or have taken up faculty positions locally and abroad. During 2011 we co-hosted the African Society for Bioinformatics and Computational Biology conference in Cape Town and we hosted 98 African scientists for a 2-day workshop at our training facility on the University of the Western Cape campus. These activities will continue to expand in 2012 through our mandate as the MRC bioinformatics capacity development unit especially with the increased need for expertise in next generation sequencing analysis.
Our staff recruitment exercise that was completed in 2010 has cemented our diverse research portfolio and arguably places us
Our staff recruitment exercise that was completed in 2010 has cemented our diverse research portfolio and arguably places us as the forerunners in bioinformatics research on the African continent. We have retained a funding stream from both local (87%) and international donors (13%) and acknowledge the generous support from the Department of Science and Technology/National Research Foundation Research Chair programme and the South African Medical Research Council in a climate where global research funding has declined. The future certainly does not look bleak with the announcement of the National Institutes of Health and Wellcome Trust’s commitment to support research on the African continent through the Human, Heredity and Health (H3Africa) programme.
as the forerunners in
The University of the Western Cape has contributed generously to the strategic acquisition of research equipment at SANBI and has supported our purchase of a high performance Dell machine (88 CPU and 512 Gb of memory). This hardware african continent. is being integrated into our existing infrastructure and will also support other computing-intensive research on campus such as the Astronomy Research Chair Programme. Our technical staff has made impressive progress with the development of an in-house CLOUD solution and virtual environment that facilitates distributed computing with ease. We believe that these solutions can be of value to other scientific computing groups locally and abroad.
bioinformatics research on the
I congratulate all our staff and students on their impressive strides in seeking biomedical solutions to diseases plaguing our society and contributing to a vibrant research culture at SANBI and on the university campus.
Professor Alan Christoffels DST/NRF Research Chair in Bioinformatics and Health Genomics
A n n u a l R e p o rt 2 0 1 1
2011 Overview Highlights: • • • • • • • • •
1 MSc and 5 PhD graduates. 5 Students attended international conferences presenting research while 4 students enjoyed internships internationally (3) and nationally (1). 12 high-impact publications and 8 keynote addresses. Unit Director, Alan Christoffels, was elected to the Academy of Science of South Africa. Two international training workshops were held and attended by 75 participants from multiple African countries. Three training workshops held for students and researchers in South Africa. Organised the African Society for Bioinformatics and Computational Biology Society meeting that was hosted in Cape Town. Acquisition of a 512 Gig memory high performance Dell machine to expand our national mandate of providing bioinformatics research support. 6 unique public resources for the biomedical community contributed by SANBI.
Research objectives over the last 12 months: • • • • • • •
To provide relevant bioinformatics services to HIV, Hepatitis C, tuberculosis and sleeping sickness researchers. To provide bioinformatics research capacity development and training in the key health domains of HIV, TB, the control of Tsetse fly (Glossina morsitans) and other disease vectors. To develop and provide analytical algorithms to discover genes that contribute to the development of complex diseases. To develop and provide a sequence database and genome annotation tools to describe the molecular epidemiology and drug resistance profile of HIV. To develop a software solution for user-friendly HIV drug resistance testing using the 454 sequencing platform. To develop a software solution for identifying nucleotide variation in Mycobacterium tuberculosis genomes. To build a research program to identify and characterise biologically relevant secondary structures in nucleotide alignments of pathogenic RNA viruses.
Progress achieved: The unit has delivered 12 health-related publications containing significant discoveries in 2011. These include articles in high impact venues such as publications in Nature Genetics (1), Journal of Virology (1), Aids Research (1) and one book chapter on disease gene prioritisation. The publications contribute to a total journal impact factor of 76.211. Unit researchers have been invited to present 8 invited talks nationally, in Africa and internationally. Over the past year, 96% of the 27 postgraduates qualify as historically disadvantaged and 37% are female. Of the 7 postdoctoral fellows, 86% are historically disadvantaged and 43% are female. As a leading HIV research team, we have contributed to the understanding of HIV CXCR4-usage during disease progression. In Malawi, we have investigated and published the prevalence of drug resistance in a treatment-naïve population of HIV infected individuals. We showed that drug resistant mutations in patients prior to treatment had no effect on treatment outcome.
Over the past year, 96% of the 27 postgraduates qualify as historically disadvantaged and 37% are female.
In South Africa we investigated the evolution of HIV-1 subtype C viruses in the female genital tract relative to the blood plasma through a longitudinal study of the sequence diversity in HIV-1 infected patients during acute and chronic infection. Our virology researchers are also investigating the degree to which RNA viral evolution is constrained by
5
6
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
2011 Overview cont. secondary structure in the families Caliciviridae and Picornaviridae (one of the most genetically diverse of the positive-sense single-stranded RNA viral families and the most common cause of infections in humans in developed countries). This work has led to the development of a viral RNA secondary structure prediction algorithm. Through participation in international efforts such as the International Glossina Genome Initiative (IGGI), SYSCO consortium and the HIV Dynamics Meeting, the Unit has maintained strong international links, and has benefited from capacity development and skills development. In particular, we have organised the international HIV Dynamics conference held in Ireland. Internationally, SANBI is synonymous with the leading African bioinformatics effort. The unit has continued its contribution to the WHO-sponsored analysis of the recently sequenced and assembled Glossina morsitans genome and we continue to host a sleeping sickness portal for the researchers on the African continent. During 2011, the unit has contributed significantly to a proposal to the NIH for a Pan African Bioinformatics Network that would support the H3Africa (www.h3africa.org) funded genetics projects. SANBI will serve as a centre of excellence to support the genetics researchers in Southern Africa. SANBI has also partnered with clinicians and geneticists in pursuing genomic approaches to understanding diseases under this initiative. We have tested the utility of our internationally competitive semantic database to clarify the often unclear or unapparent links between novel mutations recently reported in the literature and the diseases or phenotypes being investigated, and have found that we can often provide better insights than the original publication. Our system often uncovers the underlying biological mechanisms that lead to the development of phenotypes associated with a disease. An active collaboration with the TB centre of excellence at Stellenbosch Medical School has led to the development of methodology to analyse large volumes of sequencing data for Mycobacterium tuberculosis and the identification of virulence associated mutations. Through an international effort led by the SANBI we have modeled four novel TB drug targets that are being subjected to docking studies. This work provided the impetus for a PhD student within the unit to spend 2-weeks in a structural biology laboratory in Spain. SANBI’s expertise in next generation sequencing methodologies has led to the development of an exome sequencing and knowledge-discovery pipeline to identify the genetic cause(s) of disease in a patient with Multiple Sclerosis. This protocol allows for the identification of rare mutations associated with phenotypes that would normally be missed. We are developing a novel concept to prioritise mutations by mining the database for genes associated with ‘surrogate’ or ‘secondary’ phenotypes linked to disease (e.g. demyelination in multiple sclerosis). Our algorithm development effort has achieved significant breakthrough with the development of a statistical method to identify driver genes in the development and metastasis of breast cancer. Our method combines and re-analyses large numbers of data sets that may have been generated on different technology platforms as a means to increase the statistical power of the meta-analysis, while weakening the effects of individual study-specific biases.
A n n u a l R e p o rt 2 0 1 1
7
Staff Our academic staff comprises 5 principal investigators supervising a total of 27 postgraduate students. Our computational research environment is maintained by a dedicated systems administrator, software developer and a database administrator. Administrative support to both students and staff is provided by 4.5 admin staff. The Director position at SANBI is currently being filled by an interim Director, Alan Christoffels who holds the DST/NRF Research Chair in Bioinformatics and Public Health Genomics. Applicants are requested to contact Professor Christoffels for further information.
SANBI 2011 staff:
Our academic staff comprises 5 principal investigators supervising a total of 27 postgraduate students.
Junaid Gamieldien Alan Christoffels Samantha Alexander Senior Lecturer DST/NRF Research Chair in Assistant to the Bioinformatics and Health Genomics DST/NRF Interim Director: SANBI Research Chair
Dale Gibbs Systems Administrator
Gordon Harkins Senior Lecturer
Mario Jonas Database Administrator
Fungiwe Mpithi Receptionist
Ferial Mullins Finance Administrator
Maryam Salie Student Administrator
Nicki Tiffin Senior Lecturer
Simon Travers Associate Professor
Peter van Heusden Senior Systems Administrator
Junita Williams HR Administrator
Vladimir Bajic External Professor
Winston Hide External Professor
8
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Staff cont. Staff development Continuous staff development is encouraged at SANBI and during the past year staff have developed skills to enhance their work performance through workshops and formal degree studies. Conferences/workshops attended by staff: Staff Member Conference/Workshop
Benefit
Peter van Heusden
This conference gave us an invaluable opportunity to network Galaxy Community Conference 2011, Lunteren, with others working to make bioinformatics workflows more accessible to life scientists. The speakers presented the latest Netherlands. developments in the Galaxy platform as well as examples of how they adapted Galaxy to meet the needs of their local software environment. Insights gleaned from the talks and from face to face networking with conference participants have informed IT and bioinformatics workflow planning at SANBI.
Nicki Tiffin
Workshop on Empowering Genomics in Southern Africa Application to Infectious Disease, Limpopo, June 2011.
Learning new technologies for genomics research for infectious diseases in Africa. Co-hosted by the J Craig Venter Institute and the University of Limpopo.
Ferial Mullins
2011 Global Finance Conference, Dubai, November 2011.
The conference aim was to bring together finance managers of the private and public sectors and together we worked through strategies on how to share funding or financing between the sectors eg. how the private sector can become donors to the public sector and vice-versa.
Enrolled staff: Staff member
Degree
Institution
Enrolled or graduated
Graduation month
Peter van Heusden
Honours Part-time, Information Technology
University of South Africa
Enrolled
December 2012
Strategic Planning Session Annual strategic review and planning for the following year occurs during November/December of an academic calendar year. In 2011, staff at SANBI held its planning session from 1-2 December 2011 at Feathers Lodge, Durbanville. This breakaway session provided an opportunity to build inter-personal relationships and streamline processes within the institute to deliver on our mandate of providing national bioinformatics training and internationally competitive research.
SANBI Staff at the 2011 Strategic Planning Session
A n n u a l R e p o rt 2 0 1 1
Capacity Development Undergraduate Training Programme During 2011 Thapelo Mohotsi, a computer science BSc graduate was recruited for 12 months as an intern working on the development of a web resource to capture and compare genetic profiles of 100 individuals. This project has cemented an exciting collaboration with the Forensics Laboratory at the University of the Western Cape.
Third year Biotechnology module SANBI lecturers taught an introductory bioinformatics course BTN323 to 53 third-year Biotechnology students. 2011 SANBI Postgraduate Registration: Student
Gender Nationality
Degree
Year since first enrolment
Saleema Crous
F
South Africa
MSc
1
Fredrick Nindo
M
Kenya
MSc
1
Wisdom Akurugu
M
Ghana
MSc
1
Emil Tanov
M
South Africa
MSc
1
Darlington Mapiye
M
Zimbabwean
MSc
1
Saleem Adam
M
South Africa
MSc
4
Mmakamohelo Direko
F
South Africa
MSc
2
Firdous Khan
F
South Africa
MSc
2
James Matthews
M
South Africa
MSc
2
Oreetseng Moncho
F
South Africa
MSc
2
Ram Shrestha
M
Nepal
PhD
1
Mahjoubeh Jalali
F
South Africa
PhD
1
Ibrahim Ahmed
M
Sudan
PhD
1
Emad Fadhal
M
Sudan
PhD
1
Mushal Ali
M
Sudan
PhD
4
Ruben Cloete
M
South Africa
PhD
3
Musa Gabere
M
Kenya
PhD
4
Zahra Jalali
F
South Africa
PhD
2
Samuel Kwofie
M
Ghana
PhD
4
Mbandi Kimbung
M
Cameroon
PhD
2
Monique Maqungo
F
South Africa
PhD
4
Sarah Mwangi
F
Kenya
PhD
2
Edwin Murungi
M
Kenya
PhD
4
Alecia Naidu
F
South Africa
PhD
2
Kavisha Ramdayal
F
South Africa
PhD
2
Mark Wamalwa
M
Kenya
PhD
4
Adugna Woldesemayat
M
Ethiopia
PhD
2
Sumir Panji
M
Kenya
PostDoc
2
Samson Muyanga
M
South Africa
PostDoc
2
Oupa Tsotetsi
M
South Africa
PostDoc
2
Barbara Picone
F
Italy
PostDoc
1
Uljana Hesse
F
Germany
PostDoc
1
Gordon Jamieson
M
Scotland
PostDoc
1
Natasha Wood
F
South Africa
PostDoc
1
9
10
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Capacity Development cont. Postgraduate Training Programme
96% of the 27
Over the past year, 96% of the 27 postgraduates trained through SANBI qualify as historically disadvantaged and 37% were female. Of the 7 postdoctoral fellows, 86% represent historically disadvantaged students and females represent nearly 50% of all postdocs.
postgraduates trained through SANBI qualify as historically disadvantaged and 37% were female.
SANBI ANNUAL REPORT 2011 Natasha Wood
F
South Africa
PostDoc
1
[group student pic avail] 2011 SANBI Students
Distribution of postgraduate student registrations for the period 2001 Ð 2011:
Distribution of postgraduate student registrations for the period 2001 – 2011:
10 students registered for MSc Country Total
Males
Females
A n n u a l R e p o rt 2 0 1 1
Capacity Development cont. 10 students registered for MSc: Country
Total
Males
Females
South Africa
7
3
4
Kenya
1
1
Ghana
1
1
Zimbabwe
1
1
17 students registered for PhD: Country
Total
Males
Females
South Africa
6
2
4
Nepal
1
1
Sudan
3
3
Kenya
4
3
Ghana
1
1
Cameroon
1
1
Ethiopia
1
1
1
7 registered Postdoctoral fellows: Country
Total
Males
Females
South Africa
3
2
1
Kenya
1
1
Italy
1
1
Germany
1
1
Scotland
1
1
Some of the UWC Science PhD Cohort, September 2011 Graduation
11
12
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Capacity Development cont.
Musa Gabere
Samuel Kwofie
Monique Maqungo
Conor Meehan
Saleem Adam
Mark Wamalwa
SANBI graduations during 2011: Name
Degree
Thesis
Musa Gabere
PhD, UWC
Prediction of antimicrobial peptides using hyperparameter optimised support vector machines.
Samuel Kwofie
PhD, UWC
Development of a Hepatitis C virus knowledgebase with computational prediction of functional hypothesis of therapeutic relevance.
Monique Maqungo
PhD, UWC
Prostate cancer knowledgebase with functional genomic data analysis.
Conor Meehan
PhD, National University of Ireland, Galway
Understanding the interaction between HIV-1 and chemokine receptors during host cell entry with focus on the potential for resistance to CCR5-antagonists.
Mark Wamalwa
PhD, UWC
Development of a comprehensive annotation and curation framework for analysis of Glossina morsitans morsitans expressed sequence tags.
Saleem Adam
MSc, UWC
A knowledge base of stress response gene-regulatory elements in Arabidopsis Thaliana.
A n n u a l R e p o rt 2 0 1 1
13
Capacity Development cont. Conferences, workshops and courses organised by SANBI 2011 provided an exciting year of co-organising at least 3 international conferences as outlined below. Workshops to support local researchers and students are listed in an accompanying table.
ISCB Africa ASBCB Conference, 09 – 11 Mar 2011
SANBI staff co-organised
the ISCB ASBCB SANBI staff co-organised the ISCB ASBCB Bioinformatics Conference in Bioinformatics Conference Africa which was held at the Cape Town International Convention in Africa which was Centre and attended by 180 held at the Cape Town scientists from around the world. Of these, 80% were from 13 African International Convention countries. The conference was preceded by two days of workshops Centre and attended by with three parallel sessions. These 180 scientists from around were hosted at SANBI and attended by over 90 participants. We were the world. involved in the initiation of the ASBCB mentorship programme with SANBI faculty members volunteering to act as mentors for African students requiring guidance in their research projects.
The conference shared a session with the Joint International Conference of the African and Southern African Societies of Human Genetics that was held back-to-back with the bioinformatics conference, facilitating interactions and discussion between attendees from both conferences. SANBI director Alan Christoffels chaired the Scientific Committee for the conference, and also chaired the first session on functional genomics. Topics covered during the bioinformatics conference included African genomics, bioinformatics analysis for human genetics, host and pathogen systems biology, database and tool development, molecular epidemiology and evolution, search and design of vaccines and drugs, functional genomics and comparative genomics. Many international speakers gave talks on their research and SANBI was well represented in the oral presentations with Simon Travers presenting his work on characterising HIV resistance to treatment with CCR5 antagonists; Ruben Cloete presenting his work on in-silico TB drug design using comparative genome analysis of DS, MDR and XDR isolates from KZN; Samuel Kwofie presenting his research on inferring enriched biological information from graphs composed of text-derived biomedical concepts of ontologies related to Hepatitis C Virus; and Gordon Harkins presenting his work on the spread of tomato yellow leaf curl virus from the Middle East to the world.
Africa – India Joint Virtual Conference 2011, 10 – 11 Feb 2011 Two SANBI postgrad students, Stanley Mbandi Kimbung and Kavisha Ramdayal, successfully organised the South African hub of the Africa-India Joint Bioinformatics Virtual Conference 2011 (Bifx11). They served as Chair and Technical Chair. This meeting followed the successes of Bifx09 and Bifx10. This joint conference which saw the participation of hubs in India and Africa was organised by the Regional Student Groups of Central and Southern Africa, Bioinformatics Organisation, and supported by Bioclues.org and African Society of Bioinformatics and Computational Biology. The hub in South Africa was hosted at SANBI for which 30 participants drawn from various institutions in South Africa were in attendance. A major highlight were presentations from two of SANBI's researchers: Junaid Gamieldien and Simon Travers entitled “Semantic Integration of Biomedical Knowledge and Existing Data to Support In-Silico Discovery” and “A tale of two pathways: HIV resistance to treatment with CCR5 antagonists” respectively.
Stanley Mbandi Kimbung and Kavisha Ramdayal, successfully organised the South African hub of the Africa-India Joint Bioinformatics Virtual Conference 2011.
14
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Capacity Development cont. 18th International HIV Dynamics and Evolution Conference, 01 – 04 May 2011 The HIV Dynamics and Evolution Conference is the premier international conference for researchers that explore HIV evolution and diversity. In 2011 Prof Travers was the co-host of the conference which was held in Galway, Ireland. The conference was attended by almost 140 delegates with session topics including molecular epidemiology, HIV diversity and phylodynamics, vaccines and drug design as well as ultradeep sequencing – methods and approaches. On the second night of the conference there was a seminar sponsored by SANBI that was given by Edwin J. Bernard who spoke about HIV and the criminal law: combating stigma through science.
An exciting followup has been discussed between SANBI staff and Roche to host next generation sequencing analysis workshops in the following year.
NICD 454 Sequencing Platform Launch, 09 Nov 2011 Prof Travers was an invited speaker at a workshop sponsored by Roche at the National Institute for Communicable Diseases (NICD) for the launch of their new 454 Sequencing Platform. He spoke about his work using 454 sequencing to study the viral populations of individuals infected with HIV. In particular he focused on the bioinformatics issues of such work and presented solutions that are currently under development at SANBI. This talk was well received and led to stimulating discussion during the workshop and has resulted in the establishment of further potential collaborations that are currently being explored. An exciting follow up has been discussed between SANBI staff and Roche to host next generation sequencing analysis workshops in the following year.
Workshops and courses hosted or presented by SANBI: Course
Nature and Purpose
Target audience
Benefit
National Bioinformatics Workshop
A 7-week bioinformatics course that is held 5 days a week from 9am till 5pm. This course aims to give postgrad students an introduction to a range of bioinformatics topics.
Postgraduate students in South Africa.
Students develop in-depth knowledge of key topics that informs their postgraduate thesis design.
Introduction to databases/ gene expression analysis and interaction pathways
Course offered in collaboration with the European Bioinformatics Institute. The material is designed to teach skills in genetic data analysis.
The combined theoretical and 47 Biomedical researchers on the practical sessions gives participants African continent ‘real’ data to explore. including South Africa.
Population genetics
A course designed to give researchers, in particular genetics researchers a combination of theoretical and practical insights into population genetics analyses.
Understanding of popular analyses 22 Biomedical researchers on the such as genome wide association African continent studies.
Africa-India Joint Virtual Conference 2011 (Bifx11)
A virtual conference with 6 speakers, a tutorial and group discussion to encourage engagement between students and researchers to study interactions of pathogens, hosts and vectors in relevant diseases.
140 attendees from across the globe, including 30 participants drawn from various institutions in South Africa
To foster virtual interactions and collaborations among students, as well as researchers, of Africa and India and help to further the advancement of science there.
A n n u a l R e p o rt 2 0 1 1
Capacity Development cont.
Course
Nature and Purpose
Target audience
Benefit
ENSEMBL
To train biologists how to access Students and researchers in genome variation information South Africa and gene content data for various species. The last day was devoted to accessing the genome data via automated scripts.
University of Mauritius Masters Course
Lectures in population genetics and disease, genotype data, genome-wide association studies, computational disease-gene prioritisation approaches, modes of gene dysregulation and functional predictions for single nucleotide polymorphisms in disease.
Students studying for a Masters in Bioinformatics degree at the University of Mauritius.
EBIOKIT
EBIOKIT is a standalone server that contains the necessary bioinformatics tools needed by researchers. The course aims to teach researchers how to maintain a local copy of this KIT and as a consequence, provide bioinformatics support.
Research laboratories can have 17 Biomedical researchers on the their in-house bioinformatics African continent toolkits without the dependence on internet resources.
A plethora of precomputed genomics data has been generated internationally and access to this information would inform new experiments and allow medical researchers with tools to investigate their disease of interest.
The students attended lectures, and were also exposed to handson approaches through tutorials on all the subjects.
15
16
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Capacity Development cont. Internships for SANBI students The postgraduate training experience at SANBI includes opportunities to engage with overseas institutions to develop specific skill sets needed for a project. During 2011, 4 students visited various international and South African laboratories.
International internships: Ruben Cloete 10 – 21 January 2011 Centro de Investigacion Principe Felipe (CIPF), Valencia, Spain Objectives of visit: • insight in the basic concepts of comparative modeling and docking of ligands • usage of Modeler software and Autodock-vina packages
Oreetseng Moncho 4 – 24 October 2011 European Bioinformatics Institute, Hinxton, Cambridge Objective of visit: • Develop skills to implement a local version of an ENSEMBL webserver for agriculture research data visualization.
Edwin Murungi 14 Mar – 02 July 2011 Mark Field’s Laboratory, Department of Pathology, Cambridge University Objective of visit: • Use immuno-fluorescence microscopy to identify sub-cellular localization of SNARE proteins expressed in trypanosomes grown in culture
National internship: Mushal A. M. A. Ali 5 June -5 August 2011 Vector Control Reference Unit (VCRU), National Institute for Communicable Diseases (NICD), Johannesburg, South Africa Objectives of visit: • Identify and characterise microRNA expressed in the different developmental periods (eggs, larvae, pupae and adults) of the second major malaria vector in Africa Anopheles funestus. • Anopheles mosquito rearing and identification. • RNA extraction, purification and quantification.
A n n u a l R e p o rt 2 0 1 1
Capacity Development cont. Conference participation During 2011 students had opportunities to interact with foreign institutions and some participated at local and international conferences.
International Conferences: 5 students attended international conferences and presented their research.
Kavisha Ramdayal 26 – 9 March 2011 7th Biovision World Life Sciences Forum Lyon, France • Authored an article for Naturejobs covering one of the BioVision sessions, available online at http://blogs. nature.com/naturejobs/2011/04/07/tech-savvy-scientists-needed-for-healthcare-innovation.
Firdous Kahn 15 – 20 July 2011 19th Annual International Conference on the Intelligent Systems for Molecular Biology and 10th European Conference on Computational Biology Vienna, Austria
Stanley Mbandi Kimbung 31 Aug – 09 September 2011 MBO Global Exchange Lecture Course: "Next Generation Sequencing for Africa” Nairobi, Kenya
Mahjoubeh Jalali 12 – 15 November 2011 6th International Conference of Genomics Shenzhen, China
Sumir Panji 10 – 13 November 2011 Bringing Together the Tsetse Genome Research Meeting Sanger Institute, Cambridge, UK
17
18
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Capacity Development cont. National Conferences: 6 students attended national conferences and presented their research. Kavisha Ramdayal 30 January – 5 February 2011 EMBO Global Exchange Lecture Course on HIV AIDS Stellenbosch, South Africa
Fredrick Nindo 30 January – 5 February 2011 EMBO Global Exchange Lecture Course on HIV AIDS Stellenbosch, South Africa
Samson Muyanga 07 – 11 March 2011 South Africa/Argentina Joint Regional Biosafety Workshop and Seminar “Biosafety of GM crops: Emerging issues and challenges in regulatory decision making” Pretoria, South Africa
Samson Muyanga 11 – 15 September 2011 Genetically modified organisms in horticulture Symposium Pretoria, South Africa
Kavisha Ramdayal 25 – 26 October 2011 UWC Faculty of Science Post-Graduate Research Open Day 2011 Bellville, South Africa Poster: "Characterising HIV-1 subtype C gp160 envelope sequence diversity in the female genital tract and plasma during acute and chronic infection."
Emil Tanov 25 – 26 October 2011 UWC Faculty of Science Post-Graduate Research Open Day 2011 Bellville, South Africa Poster: “Identification of biologically important secondary structures in Enterovirus genus”.
A n n u a l R e p o rt 2 0 1 1
19
Computational Resources Historically, SANBI has relied on a heterogenous server infrastructure, with a collection of over a dozen servers ranging from simple Intel PCs to IBM p690s. These have provided web services, a high performance computing infrastructure, and the various infrastructure services (authentication, mail, file server) necessary for the institute's functioning. In 2011 we started migrating from this old infrastructure (80 CPU cores and 8 TB of disk space, spread across various servers) to a new, more powerful, server infrastructure. SANBI's new server infrastructure core consists of a set of Dell blade servers, with a total of 88 CPU cores and 512 GB RAM. The blade servers include six M710HD servers, with between 64 and 96 GB of RAM and 12 CPU cores each, and one M910HD with 512 GB of RAM and 16 CPU cores. Together with a R710 rack mounted server, these provide an environment for both High Performance Computing and a widespread use of virtual machine (VM) technology to meet the divergent needs of the Bioinformatics community. We are currently researching "private cloud" solutions to manage this VM infrastructure. The new servers are connected with each other, and with our new Dell EqualLogic storage array network (SAN) by 10 Gb Ethernet, a 10 times speedup of our network infrastructure. All in all, the new server infrastructure is approximately 12 times more powerful than our old servers.
SANBI's new server infrastructure provides an environment for both High Performance Computing and a widespread use of virtual machine (VM) technology to meet the divergent needs of the
The year 2011 has seen a tremendous growth in sophistication in our computing environment as we have deployed PUPPET for centralised configuration management Bioinformatics and are migrating to an "infrastructure as code" approach to deploying and managing server infrastructure. This frees up time and effort from our computer systems group so that they can spend more time on new infrastructure research and development, and less time on maintenance.
community.
The services provided to the SANBI community and their collaborators include: 1) A Grid Engine based computer cluster, that allows simplified access to the compute resources we provide. Users of the cluster have access to more than 120 different bioinformatics analysis packages and more than 15 TB of disk space for storing research data and results. 2) Web servers both for the main SANBI website as well as for presenting research outputs from SANBI's different research groups.
Powervault with M710HD
3) Application servers hosted on our virtual machine infrastructure, to allow each group within SANBI to configure as they see fit. This approach is vital in bioinformatics, as the software environment required by groups can be dramatically different, and while providing physical servers for each project is, neither feasible nor optimal, using virtual machines allows us to provide for the needs of each group without their requirements coming into conflict with one another. 4) Database servers, providing both MySQL and PostgresQL for storing relational data schemas (RDBMs). 5) Centralised data collections, including up to date versions of major biological databases. The computer systems group at SANBI continued to grow in terms of capacity throughout 2011 and through our work at SANBI and collaboration with partners at other institutions, continues to provide for the computing needs of our SANBI research community.
M910HD
20
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Awards and Honours Alan Christoffels was elected to the Academy
Awardee
Awarding Institution Nature of award/ recognition
Significance
Alan Christoffels
Academy of Science of South Africa
Membership
Elected to the academy for recognition of scientific research
Alan Christoffels
Belgium-Flemish government’s Flanders Care International Visitors’ programme
6-member international team visiting health care initiatives in Flemish Belgium
The Unit Director was the South African representative on the trip with the aim of encouraging international exchange in the Health care sector.
Emil Tanov
UWC Science Faculty
2nd Best poster presented at the Research Open Day
Voted the 2nd best MSc research paper in the Science Faculty at its open day.
of Science of South Africa for recognition of scientific research.
International team visiting healthcare initiatives in Flemish Belgium. From L – R: Mrs Biruta Kleina (Latvia), Mr Yuk Han Sun (Hong Kong), Dr Ilesh Jani (Mozambique), Dr Alan Christoffels (South Africa), Mr Piet Houwen (Vice President and General Manager, GENZYME), Mrs Catharina de Jong (Netherlands), Mr David Jones (UK), Mrs Christine Breugelmans (Senior Communications and PR Officer, Flemish Department of Foreign Affairs)
Care International Visitors’ programme
Emil Tanov
UWC Science Faculty
initiatives in Flemish Belgium
nd
2 Best poster presented at the Research Open Day
SANBI in the Media
representative on the trip with the aim of encouraging A n n u a l R e p oexchange rt 2 0 1 1 international in the Health care sector. nd Voted the 2 best MSc research paper in the Science Faculty at its open day.
21
SANBI in the media The article below appeared in the Cape Argus of 27 July 2011. Professor Travers commented on prevention remaining optimal for Professor combating in the future as no effective vaccine has for The article below appeared in the the Cape Argusapproach of 27 July 2011. TraversHIV commented on prevention remaining the optimal approach been developed to date. combating HIV in the future as no effective vaccine has been developed to date.
Mail & Guardian Supplement, 18 - 25 November 2011
Irish Times publication, 3 May 2011
Irish Times publication May 3, 2011
HIV-related cases pose big problems for criminal law CLAIRE O'CONNELL The Irish Times - Tuesday, May 3, 2011 THE CRIMINAL justice system is ill-equipped to deal with the complexities of HIV, HIV-related cases pose big problems for criminal law and legal decisions need to be based on good science, according to writer and CLAIRE O'CONNELL advocate on HIV-related issues Edwin J Bernard, who will give a public seminar THE CRIMINAL justice system is ill-equipped to deal with complexities of HIV, and legal decisions began needintothe belate in the Galway today. “Although the first laws and prosecutions based on good science, according to writer and advocate on HIV-related issues Edwin J Bernard, who will give a 1980s, there are more laws being passed and more prosecutions today than ever before,” said Mr Bernard, who analyses the global criminalisation of people public seminar in Galway today. living 1980s, with HIV. there To date,are more than 600 individuals in more than 40 more countries “Although the first laws and prosecutions began in the late more laws being passed and convicted the of HIV exposure or transmission, according to Mrliving Bernard, prosecutions today than ever before,” said Mr Bernard, have whobeen analyses global criminalisation of people who will talk today about how criminal law is applied in such instances. with HIV. “I’ll argue that in the vast majority of cases, laws and prosecutions are irrational, To date, more than 600 individuals in more than 40unjust countries have beenbased convicted HIV exposure or and counterproductive, on stigma of not science.” Situations where transmission, according to Mr Bernard, who will talk todaya person aboutgenuinely how criminal isharm applied such instances.noted Mr intendedlaw to do shouldinlead to prosecution, Bernard. “If someone planned tounjust harm someone by not telling them they are “I’ll argue that in the vast majority of cases, laws and prosecutions are has irrational, and counterproductive, HIV-positive and then has unprotected sex with the intention of harming them – based on stigma not science.” they were actually infected – then noted these very, rare cases should be, and Situations where a person genuinely intended to do harmand should lead to prosecution, Mrvery Bernard. are, prosecuted,” he said. “However, there are many people with HIV in prison, “If someone has planned to harm someone by not telling them they are HIV-positive and then has unprotected including some who have died in prison, who did not do anything that risked sex with the intention of harming them – and they were actually infected – then these very, very rare casesThe should harming someone with HIV, and certainly did not infect anyone.” laws and be, and are, prosecuted,” he said. prosecutions also have a wider impact, added Mr Bernard, whose public seminar “However, there are many people with HIV in prison, including some whobyhave died in prison, did notInstitute. do today is being sponsored the South African National who Bioinformatics “For people livinginfect with HIV, they can create a climate of fear and uncertainty anything that risked harming someone with HIV, and certainly did not anyone.”
and media reporting of such cases does nothing to reduce HIV-related stigma, the greatest non-health-related challenge for those living with HIV. “For everyone else, these laws and prosecutions are creating a distorted picture of HIV-related harm and risk and undermining the public health message that everyone shares responsibility for their sexual health,” he said. “What UNAIDS and others are working on right now is to ensure that everyone involved in law making and in the criminal justice system understands the latest advances in the science of HIV. “Justice can be better achieved when laws and legal decisions are based on good science, best practice guidelines for police and prosecutors are in place, and people with HIV accused of such ‘crimes’ have improved access to justice.” The public seminar, HIV and the Criminal Law: Combating Stigma Through Science, will take place in the MRI Annex, NUI Galway today at 5.30pm and is part of the 18th International Conference on HIV Dynamics and Evolution, hosted by NUI Galway.
15 arguably SANBI ANNUAL REPORT 2011
Flanders Today publication, 21 December 2011
International VIPs visit Flanders Care Flanders last week welcomed six international opinion and decision makers as part of the Flanders International Visitors Programme, an ongoing programme of five-day visits focusing on different sectors. The theme this time was Flanders Care: Focus on Innovation and Entrepreneurship in Health Care in Flanders. The six opinion makers were Dr Alan Christoffels from South Africa, Catharina de Jong from the Netherlands, Dr Ilesh Jani from Mozambique, David Jones from the United Kingdom, Yuk Han Sun from Hong Kong and Biruta Kleina from Latvia.
22
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Community Engagement Over the past three years we have focused on the Life Orientation subject at schools and developed a grade 7-9 resource kit using Tuberculosis as a theme.
In 2008, the Center for Disease Control USA funded UWC under an HIV/AIDS public health programme. As part of this funding mechanism, Alan Christoffels from SANBI and Patricia Struthers from the Physiotherapy department were funded to develop educational material for grade 8 learners. During the past three years the resource was tailored to grade 7 - 9 learners to accommodate the national education department criteria. This project targeted 7 pilot schools, namely Elswood, Ravensmead, St Andrews, Sakumlandela, Iingcinga Zethu, York and Langenhoven. Over the past three years we have focused on the Life Orientation subject at schools and developed a grade 7-9 resource kit using Tuberculosis as a theme.
Events that marked the progress on this project in 2011 included (a) completion of the curriculum content into 8 lesson plans with individual and group activities, (b) graphic illustrations for each lesson plan and (c) development of an electronic interactive DVD that replaced the information content in the lesson plans together with digital story-telling and games. The final year (2012/13) will be devoted to printing of 4000 copies of both the hard copy and DVD for each learner in the pilot schools and external monitoring and evaluation of the use of the resource kit.
TB in sputum
HIV
University of the Western Cape
Interactive DVD Workbook Grade 7-9 interactive life orientation resource kit to be piloted in 7 schools in the Western Cape
A n n u a l R e p o rt 2 0 1 1
Research Outputs Summary: Output Type
Output Total
Journal Publications
11
Book Chapter
1
Software
6
Invited Talks
8
Conference Talks
10
Conference Posters
15
Theses
6
Theses Examined
4
Peer reviewed journal articles: No.
Authors (Unit contributors in bold)
Impact Factor
1
Samuel Kwofie, Ulf Schaefer, Vijayaraghava Sundararajan, Vladimir Bajic, Alan 3.086 Christoffels. HCVpro: Hepatitis C virus protein interaction database. Infection, Genetics and Evolution. Dec 2011. 11(8):1971-1977
2
Samuel K Kwofie, Aleksandar Radovanovic, Vijayaraghava S Sundararajan, Monique 3.086 Maqungo, Alan Christoffels and Vladimir B Bajic. Dragon Exploratory System on Hepatitis C Virus (DESHCV). Infection, Genetics and Evolution. 2011. 11(4):734-9
3
Sarah Mwangi, Edwin Murungi, Mario Jonas and Alan Christoffels. Evolutionary 3.086 Genomics of Glossina morsitans immune-related Serine proteases and Serine Protease inhibitors. Infection, Genetics and Evolution. 2011. 11(4): 740-745.
4
Vladimir Shulaev, Daniel J Sargent, Ross N Crowhurst, Todd C Mockler, Otto Folkerts, Arthur 36.377 L Delcher, Pankaj Jaiswal, Keithanne Mockaotis, Aaron Liston, Shrinivasrao P Mane, Paul Burns, Thomas M Davis, Janet P Slovin, Nahla Bassil, Roger P Hellens, Clive Evans, Tim Harkins, Chinnappa Kodira, Brian Desany, Oswald R Crastam Roderick V Jensen, Andrew C Allan, Todd P Michael, Joao Carlos Setubal, Jean-Marc Celton, D Jasper G Rees, Kelly P Williams, Sarah H Holt, Juan Jairo Ruiz Rojas, Mithu Chatterjee, Bo Liu, Herman Silva, Lee Meisel, Avital Adato, Sergei A Filichkin, Michela Troggio, Roberto Viola, Tia-Lynn Ashman, Hao Wang, Palitha Dharmawardhana, Justin Elser, Rajani Raja, Henry D Priet, Douglas W Bruant Jr, Samuel E Fox, Scott A Givan, Larry J Wilhelm, Sushma Naithani, Alan Christoffels et al. The genome of woodland strawberry (Fragaria vesca). Nature Genetics. 2011. 43: 109-116. Doi:10.1038/ng.740
5
Vijayaraghava Seshadri Sundararajan, Musa Nur Gabere, Ashley Pretorius, Saleem Adam, Alan Christoffels, Minna Lehvaslaiho, John A. C. Archer and Vladimir B. Bajic. DAMPD: a manually curated antimicrobial peptide database. Nucleic Acids Research, 2011 (1-5). doi:10.1093/nar/gkr1063
6
4.411 Niamh E. Redmond, Jean Raleigh, Rob W. M. van Soest, Michelle Kelly, Simon A. A. Travers, Brian Bradshaw, Salla Vartia, Kelly M. Stephens, Grace P. McCormack. Phylogenetic Relationships of the Marine Haplosclerida (Phylum Porifera) Employing Ribosomal (28S rRNA) and Mitochondrial (cox1, nad1) Gene Sequence Data. PLoS ONE. 2011. 6(9): e24344
7
4.877 Semegni JY, Wamalwa M, Gaujoux R, Harkins GW, Gray A, Martin DP. NASP: a parallel program for identifying evolutionarily conserved nucleic acid secondary structures from nucleotide sequence alignments. Bioinformatics. 2011 Sep 1;27(17):24435. Epub 2011 Jul 14.
7.836
23
24
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Research Outputs cont. 8
Monjane AL, Harkins GW, Martin DP, Lemey P, Lefeuvre P, Shepherd DN, Oluwafemi 5.189 S, Simuyandi M, Zinga I, Komba EK, Lakoutene DP, Mandakombo N, Mboukoulida J, Semballa S, Tagne A, Tiendrébéogo F, Erdmann JB, van Antwerpen T, Owor BE, Flett B, Ramusi M, Windram OP, Syed R, Lett JM, Briddon RW, Markham PG, Rybicki EP, Varsani A. Reconstructing the history of maize streak virus strain a dispersal to reveal diversification hot spots and its origin in southern Africa. Journal of Virology. 2011 Sep;85(18):9623-36. Epub 2011 Jun 29.
9
Bansode VB, Travers SAA, Crampin AC, Ngwira B, French N, Glynn JR and McCormack 1.77 GP. (13 October 2011). Reverse Transcriptase drug resistance mutations in HIV-1 Subtype C infected patients on ART in Karonga District, Malawi. AIDS Research and Therapy. 2011, 8(38) doi:10.1186/1742-6405-8-38
10
4.411 Lefeuvre P, Harkins GW, Lett JM, Briddon RW, Chase MW, Moury B, Martin DP. Evolutionary time-scale of the begomoviruses: evidence from integrated sequences in the Nicotiana genome. PLoS One. 2011;6(5):e19193. Epub 2011 May 16.
11
Seager I, Leeson MD, Crampin A, Mulawa D, French D, Glynn J, Travers S.A.A., 2.082 McCormack GP. HIV-1 mutational patterns in HIV-1 subtype C infected long-term survivors in Karonga District Malawi: correction and further analysis. AIDS Research and Human Retroviruses. 2011 August 30 Epub ahead of print. TOTAL JOURNAL IMPACT FACTOR
76.211
Chapters in books: 1
Tiffin, N. Conceptual thinking for prioritization of candidate disease genes. In Methods in Molecular Biology - In Silico Tools for Gene Discovery. Editors: Bing Yu; Marcus John Hinchcliffe. Publisher: Humana Press
Software and similar outputs developed or generated/implemented: Year
Software Resources
Impact
2011
miRNA targets in insects http://insectar.sanbi.ac.za
Out of a PhD project – tool to predict miRNA targets in insects with a focus on mosquitoes
2011
HCV protein interaction database http://apps.sanbi.ac.za/hcvpro
Output of a PhD project – hepatitis C virus resource that identifies protein-protein interactions
2011
Glossina Genomics Analysis Resource http://iggiweb.sanbi.ac.za/markw/
Output of a PhD project - A central portal for disease vector research on the African Continent
2011
Arabidopsis Stress-related Transcription Factor Database http://apps.sanbi.ac.za/dastf/
Output of an MSc project – regulatory motifs identified in response to environmental stress on plants.
2011
Grade 7-12 learners and Teachers e-learning resource kit http://skills4life.org
Interdisciplinary project- An interactive DVD was finalised during 2011 together with a workbook. This electronic resource was launched in March 2012.
2011
NASP: Nucleic Acid Structure Predictor Inter-university project - A parallel program for http://web.cbio.uct.ac.za/~yves/nasp_portal identifying evolutionary conserved nucleic acid secondary structures from sequence alignments
A n n u a l R e p o rt 2 0 1 1
Research Outputs cont. Invited plenary: Jan 2011
Invited Talk Simon Travers. Institute of Immunology Seminar Series, National University of Ireland, Maynooth Uncovering the pathways of HIV resistance to antiretroviral therapies.
Feb 2011
Keynote
Mar 2011
Invited Talk Simon Travers. Molecular and Cell Biology Seminar Series, University of Cape Town. Understanding the mechanisms of HIV-1 drug resistance.
Simon Travers. Bifx Africa-India joint virtual conference. Feb 2011. A tale of two pathways: HIV resistance to treatment with CCR5 antagonists.
May 2011 Invited Talk Alan Christoffels. Dentistry Faculty Research Day, UWC. Genomic Strategies to combat Tuberculosis. Sep 2011
Invited Talk Simon Travers. Biomedical Sciences Seminar Series, Tygerberg Hospital, Cape Town. Investigating resistance to next-generation HIV therapeutic interventions.
Nov 2011
Invited Talk Simon Travers. National Institute for Communicable Diseases (NICD) 454 Sequencing Platform Launch. 09 November 2011. Amplicon-based approaches using 454 sequencing in viral and metagenomic studies - a bioinformatics perspective.
Nov 2011
Invited Talk Junaid Gamieldien. Novartis Research Day, UWC. Biomedical Knowledge Integration to Support Clinical Genomics.
Nov 2011
Invited Talk Alan Christoffels. Tsetse Genomics, Wellcome Trust Genome Campus, Cambridge, UK. Annotation of immunity genes.
Conference presentations or posters: Jan 2011
Talk
Simon Travers. Pathogen Biology and Evolution Meeting. Strasbourg, France. What now? HIV research in the South African National Bioinformatics Institute.
Feb 2011
Talk
Junaid Gamieldien. Bifx Africa-India joint virtual conference. Feb 2011. Semantic Integration of Biomedical Knowledge and Existing Data to Support In-Silico Discovery
Feb 2011
Talk
Nicki Tiffin. Bifx Africa-India joint virtual conference. Feb 2011. A bioinformatics perspective on human disease genetics
Mar 2011
Talk
Gordon Harkins. ISCB Africa ASBCB Conference on Bioinformatics 2011. Cape Town, South Africa, March 2011. The spread of Tomato yellow leaf curl virus from the Middle East to the world.
Mar 2011
Talk
Simon Travers. ISCB Africa ASBCB Conference on Bioinformatics 2011. Cape Town, South Africa, March 2011. A tale of two pathways: Characterising HIV resistance to treatment with CCR5 antagonists treatment.
Mar 2011
Talk
Ruben Cloete, Ekow Oppon, Alan Christoffels. ISCB Africa ASBCB Conference on Bioinformatics 2011. Cape Town, South Africa, March 2011. In-Silico TB drug design using comparative genome analysis of DS, MDR and XDR isolates from KZN.
Mar 2011
Talk
Samuel Kwofie, Vlad Bajic, Alan Christoffels. ISCB Africa ASBCB Conference on Bioinformatics 2011. Cape Town, South Africa, March 2011. Inferring enriched biological information from graphs composed of text-derived biomedical concepts of ontologies related to Hepatitis C Virus.
25
26
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Research Outputs cont. Mar 2011
Talk
George Obiero. ISCB Africa ASBCB Conference on Bioinformatics 2011. Cape Town, South Africa, March 2011. Comparative Annotation and Analysis of Protein-Coding DNA Sequences of Theileria parva Marikebuni against Theileria parva Muguga genomes.
Mar 2011
Poster
Roetz, N, Möller, M, Tiffin, N, Christoffels A, Hoal, E. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. Variants in Host Susceptibility to Tuberculosis using SNP Array.
Mar 2011
Poster
Tiffin, N, Hofmann, O, The SysCo Consortium, Schwegmann, A, Brombacher, F, Hide, W. ISCB Africa ASBCB Conference on Bioinformatics , Cape Town, South Africa, March 2011. Analysis of differential gene expression and regulatory networks in Leishmaniainfected macrophages from susceptible and resistant mouse strains.
Mar 2011 Poster
Stanley Kimbung, Jean-Marc Celton, Oreetseng Moncho, Lizex Husselman, Adugna Woldesemayat, Joseph Mafofo, Peter van Huesden, Jasper Rees, Alan Christoffels. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. A Computational Framework for Venturia Inaequalis Genomics.
Mar 2011
Poster
Emily Tangie, Vincent Titanji, Alfred Ngwa, Damian Anong, Stanley Mbandi, Ivo Tening, Raymond Yengo. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. Differential Immunoglobulin G Response to UB05- a Potential Plasmodium falciparum Vaccine Target.
Mar 2011
Poster
Alecia Naidu, Mmakamohelo Direko, Peter van Heusden, Junaid Gamieldien, Paul van Helden, Rob Warren, Nico Gey van Pittius, Alan Christoffels. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. Development of a computational framework for the management of next generation Mycobacterial Sequencing Data.
Mar 2011
Poster
Musa Gabere, Alan Christoffels, William Noble, Vladimir Bajic. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. HAPP: Haematophagus antimicrobial peptide predictor.
Mar 2011
Poster
Jean Yves Semegni, Mark Wamalwa, Gordon Harkins, Alistair Gray, Darren P Martin. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. NASP: A parallel program for identifying evolutionarily conserved nucleic acid secondary structures from sequence alignments.
Mar 2011
Poster
Samson Muyanga, Ashley Pretorius, Firdous Khan, Alan Christoffels. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. Computational Discovery of Carotenoid Pathway Regulatory Networks.
Mar 2011
Poster
Mark Wamalwa. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. The transcriptome profile of Glossina morsitans morsitans: a vector for sleeping sickness.
Mar 2011
Poster
Adugna Woldesemayat, Junaid Gamieldien, Bongani Ndimba, Alan Christoffels. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa, March 2011. Computational identification of candidate genes for drought tolerance in Sorghum (Sorghum bicolor (L.) Moench).
A n n u a l R e p o rt 2 0 1 1
Research Outputs cont. May 2011 Talk
Simon Travers. 18th International HIV Evolution and Dynamics Meeting. Galway, Ireland. 1 - 4 May 2011. Drug Resistance in HIV-1 Subtype C infected patients on ART in Karonga District, Malawi using consensus and next-generation sequencing.
Jun 2011
Talk
Simon Travers. 5th SA AIDS Conference, Durban. 7-10 June 2011. Characterising the Emergence, Prevalence and Persistence of Drug Resistant Variants in the Viral Population of HIV-1 Subtype C Infected Individuals.
Mar 2011
Poster
Z.Chikwambi, A Christoffels and D.J.G Rees (2011). Fruit. Biotechnology Fruit Conference. Pretoria. Developmental peel and pulp tissue specific mRNA expression profiling in Malus x domestica Borkh. Cv “Golden Delicious”
Sep 2011
Poster
Samson Muyanga, Firdous Khan and Alan Christoffels. Holticulture GMO symposium. Kruger National Park, Mpumalanga. Sep 2011. Computational discovery of caroteniod pathway networks.
Nov 2011
Poster
Junaid Gamieldien. The 6th International Conference on Genomics. Shenzhen, China. Driving Disease Gene Discovery with Biomedical Semantic Networks.
Nov 2011
Poster
Mahjoubeh Jalali Sefid Dashti, Van Velden DP, Gamieldien J, Fisher LR, Marnewick J, Kidd M, Kotze M. The 6th International Conference on Genomics. Shenzhen, China. Evaluation of high-throughput methodology for multi-gene screening in South Africans at risk of cardiovascular disease.
Nov 2011
Poster
GFO Obiero, PO Mireji, A Christoffels and D Masiga. ICIPE Research Open Day, Nairobi, Kenya Bioinformatics approaches to finding the tsetse fly nose.
Nov 2011
Poster
Tanov, E, Martin D.P, Muhire, B, Golden, M and Harkins, GW. UWC Life Science Research Open Day Identification of biologically important secondary structures in Enterovirus genus.
Nov 2011
Poster
Lambson, B, Ramdayal, K, Moore, PL, Abrahams, MR, Bandawe, Karim SA,Williamson C, Martin,DP, Harkins, GW and Morris L. UWC Life Science Research Open Day Characterizing HIV-1 subtype C gp160 envelope sequence diversity in the female genital tract and plasma during acute and chronic infection.
Theses: Name
Degree
Thesis
Musa Gabere
PhD, UWC
Prediction of antimicrobial peptides using hyperparameter optimised support vector machines.
Samuel Kwofie
PhD, UWC
Development of a Hepatitis C virus knowledgebase with computational prediction of functional hypothesis of therapeutic relevance.
Monique Maqungo
PhD, UWC
Prostate cancer knowledgebase with functional genomic data analysis.
Conor Meehan
Understanding the interaction between HIV-1 and chemokine receptors PhD, National during host cell entry with focus on the potential for resistance to University of Ireland, Galway CCR5-antagonists.
Mark Wamalwa
PhD, UWC
Development of a comprehensive annotation and curation framework for analysis of Glossina morsitans morsitans expressed sequence tags.
Saleem Adam
MSc, UWC
A knowledge base of stress response gene-regulatory elements in Arabidopsis Thaliana.
27
28
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Research Outputs cont. Thesis examination for students from other institutions: Alan Christoffels
University of Pretoria University of Cape Town
PhD MSc
Junaid Gamieldien
University of Stellenbosch
MSc
Nicki Tiffin
University of the Witwatersrand
MSc
Expert panel or professional membership: Christoffels
Board of Directors – International Society for Computational Biology Scientific Committee Chair for the African Society of Bioinformatics and Computational Biology Meeting in Cape Town Scientific committee: International Conference on Bioinformatics (InCoB): InCoB11 Malaysia Nov 2011. Official annual conference of the Asia-pacific Bioinformatics network (APBioNet) NRF Rating Panel
Travers
Scientific committee; 19th International HIV Dynamics and Evolution Conference Review panel for South African Research Chairs Initiative (NRF): Bioinformatics and Functional Genomics Stream
Policy briefs: Human Heredity and Health in Africa (H3Africa) In response to the Human Health and Heredity in Africa (H3Africa) joint initiative funded by the Wellcome Trust (UK) and the National Institute of Health (USA), SANBI unit has partnered with multiple African researchers to While the outcome of this propose a Pan African Bioinformatics network. While the outcome of this proposal is not available yet, SANBI will be one of the centers of excellence that will support proposal is not available the genetics community in Southern Africa.
yet, SANBI will be one of the centers of excellence
Additionally, SANBI is the bioinformatics partner in multiple disease-specific research networks under this initiative.
that will support the
Southern African Human Genome Programme (SAHGP) Alan Christoffels together with six other academics across the country submitted genetics community in a successful proposal to the Department of Science and Technology during June Southern Africa. 2010 for seed funding to support a planning meeting to formalise the SAHGP and develop a 5-year plan for the SAHGP in South Africa. The planning meeting was held in January 2011 and attended by 65 researchers. A proposal to the value of 20 million rand has been submitted to the Department of Science and Technology.
Intervention programme: As part of an inter-disciplinary project funded through the CDC-funded initiative, we have completed the alpha version of an interactive electronic medium i.e., DVD to be used in schools for education relating to tuberculosis and life skills. The final printing of the DVD and manuals were completed in December 2011 and the final phase of piloting will take place in 2012. The electronic teaching tool has been placed on the internet as part of a web-resource to mirror the classroom toolkit (www.skills4life.org).
A n n u a l R e p o rt 2 0 1 1
Research Projects Overview Communicable Diseases Mycobacterium tuberculosis: Virulence mutations: In collaboration with the Tygerberg MRC unit, we are developing methods to analyse high throughput sequencing data for microbial genomes. Identification of novel drug targets in pathways known to contain drug resistant genes. Malaria: In collaboration with the National Institute for Communicable Diseases (NICD), we are investigating miRNA targets in Anopheles funestus to understand regulation of mosquito development. HIV research: HIV drug resistance: In collaboration with groups in Malawi and Ireland we are studying CXCR4-usage during disease progression Development of a software solution for user-friendly HIV drug resistance testing using the 454 sequencing platform. A longitudinal study investigating the evolution of HIV-1 subtype C viruses in the female genital tract relative to the blood plasma.
Non-communicable Diseases Multiple Sclerosis: Development of an exome sequencing and knowledge discovery pipeline to identify rare mutations. Breast Cancer: Development of a statistical method for cross-platform microarray analysis.
Agricultural research programme Fungal-host pathogens: In collaboration with Dr Rees, we are developing tools for capturing genomic data from crop and fungal genomes. and identification of pathogenic genes in infected groups.
Forthcoming Projects Rat model for post-traumatic stress disorder: In collaboration with a group at Tygerberg medical school, we will be extending our exome sequencing discovery pipeline to an analysis of RNASeq data obtained from a rat model. Salt-sensitive hypertension in African populations: In collaboration with clinicians at Groote Schuur we will sequence candidate genes in a series of 300 samples from normotensives, hypertensives and salt-sensitive hypertensives from the isiXhosa-speaking population in Cape Town, using next generation sequencing techniques. This data will be compared to existing data from northern hemisphere populations.
29
30
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Research Laboratories PI: Prof Alan Christoffels DST/NRF Research Chair in Bioinformatics and Health Genomics 2009 - 2012 My genomics laboratory is primarily focused on developing methods to better understand host-pathogen interactions. Since 2010, my group has been working on applications of next generation sequencing technology to the understanding of diseases that impact health in South Africa and on the African continent. The interaction networks between host and pathogen are being studied in tuberculosis, blood-borne disease vectors – Tsetse and anopheles and more recently fungal invasion of economically important crops in South Africa. My Research Chair in Bioinformatics currently supports 6 MSc, 7 PhD students and 4 Postdoctoral fellows either directly or through external grants. The research activities have focused on next generation sequencing data and developing methods to efficiently manage and analyse high-throughput data. The consequence of the next generation sequencing momentum in South Africa has resulted in our participation in additional projects to support bioinformatics needs. Outputs: • International and national conference presentations, 5 publications in 2011, 2 internships and 4 Bioinformatics tools. • Internships: – Edwin Murungi (Mark Field’s Lab, Cambridge University, UK) – Oreetseng Moncho (European Bioinformatics Institute, UK) Tuberculosis (http://www.sanbi.ac.za/tb_genomics/)
Tuberculosis (TB) is prevalent in sub-Saharan Africa and in the context of South Africa, the incidence of TB in the Western Cape is among the highest in the country. Together with HIV, these two diseases are a deadly combination. The severity of TB prevalence in South Africa is complicated by the presence of drug-resistant TB. Intervention strategies for TB range from clinical trials for new TB drug treatment to improved surveillance of multi-drug resistance informed by advances in TB diagnostic tests. Researchers at Tygerberg Medical School have sequenced clinical isolates of TB in South Africa. In collaboration with these investigators, we are: • developing and implementing methodology for short read data from bacterial genomes • developing bioinformatics resources for managing these genetic data sets to accelerate deeper insights into the underlying mechanisms of host evasion and virulence factors • developing methods to correlate the genetic variation in TB isolates with an expanded interaction network of virulence TB genes • identifying novel drug targets using in silico docking studies. Collaborators: Profs Eileen Hoal Van Helden, Nico Gey Van Pittius, Rob Warren and Dr Cedric Werley Tygerberg Medical School, University of Stellenbosch. Dr Ekow Oppon, South African Medical Research Council.
A n n u a l R e p o rt 2 0 1 1
Research Laboratories cont. Blood-borne disease vectors (http://www.sanbi.ac.za/disease_vectors)
Sleeping Sickness Tsetse (Glossina) is the vector for trypanosomes, which cause, among other diseases, human African trypanosomiasis (HAT). There are more than 300,000 cases of HAT with millions more people at risk in 37 countries in Africa. Although not present in South Africa, HAT is prevalent in neighboring countries with new cases being reported in countries such as Zimbabwe, Zambia and Mozambique to name but a few Insights into the interaction of the trypanosome and the host could promote improved intervention strategies. Together with our collaborators (listed in brackets) we are investigating: • annotation of the tsetse genome (International Glossina Genome Consortium (IGGI)) • machine learning methods to identify immunity genes (Vlad Bajic, KAUST) • comparative genomics of serine protease inhibitors (IGGI) • protein-protein interactions between trypanosome and tsetse proteins (Prof Mark Fields, Cambridge and Prof Henry Nyongesa, Computer Science, UWC) IGGI consortium, including: Matt Berriman, Sanger Centre Serap Aksoy, Yale University Dan Masiga, International Centre for Insect Physiology and Etymology, Kenya Mike Lehane, Liverpool Tropical School of Medicine Role of miRNA in Anopheles vectoral capacity (http://insectar.sanbi.ac.za) miRNA play an essential task in gene regulatory networks by controlling the expression of genes involved in important biological processes in the cell. In insect, thousands of miRNA genes have been identified, but the function of most of these miRNAs remain unknown due to lack of experimental and computational approaches to predict their exact target mRNAs. In collaboration with Prof Lizette Koekemoer, we are developing an integrated system to identify miRNA targets in Anopheles and other insects. Diseases of Apple (http://www.sanbi.ac.za/agri_genomics)
Apple scab is one of the most destructive diseases of apple (Malus x domestica borkh.) and is caused by the hemi-biotrophic fungus Venturia inaequalis (Cooke) Winter. Scab is a serious problem in all apple-producing regions of the world and requires a series of 12 to 15 fungicide sprays per year in commercial orchards. In total, eight races of the scab pathogen have been defined by incompatibility, determined by avirulence genes (avr genes), on corresponding host cultivars carrying a major resistance gene (R gene). Next generation sequencing has been generated for apple fungal infection. Data management and downstream analysis protocol requires a computational framework that allows users to engage with the data and promotes hypothesis driven research. The following bioinformatics projects are underway:
31
32
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Research Laboratories cont. • • •
establishing a local instance of ENSEMBL Plant and Fungal databases, genome browser and communication protocols transcriptome profiling of host-pathogen response integrating transcription data, SNP and CNV data and information from more selective sequencing strategies such as ChIP-Seq and Bisulphite sequencing (for methylation analysis).
Collaborator: Professor Jasper Rees - Agricultural Research Council
A n n u a l R e p o rt 2 0 1 1
Research Laboratories cont. PI: Dr Junaid Gamieldien Knowledge Integration & Biomarker Discovery Group Core Project: Semantic Integration of Biomedical Knowledge Our core research project, which many of our other projects rely on, focuses on the semantic integration and re-use of high-value biomedical information in the public domain to: 1) enable in-silico experimentation that encompasses multiple knowledge domains and 2) for contextualising the results of high throughput experiments. We use a knowledge representation technique known as a semantic network, which is stored in a next-generation graph database. This greatly simplifies the integration of complex biological information in the way biologists think about and reason across them. The flagship project is focused on human health, which seamlessly integrates hundreds of thousands of human, mouse and rat: gene, gene to disease, gene to phenotype and gene to pathway relationships. The semantic database is particularly relevant in the disambiguation of experiments that generate large numbers of leads. For example, high throughput technologies like next generation sequencing make it possible to identify multiple gene candidates that may be of biomedical interest. We have tested the utility of our semantic database in clarifying the often unclear or unapparent links between novel mutations recently reported in the literature and the diseases or phenotypes being investigated, and have found that we can often provide better insights than the original publication. Our system often also uncovers the underlying biological mechanisms that lead to the development of phenotypes associated with a disease. While the utility of the current semantic network is clear, we are constantly adding relevant genomic information to the system and will prioritise adding genome scale knowledge on gene expression in specific tissues to the semantic network in 2012. Project 2: Disease Gene Discovery with Genome Sequencing and Semantic Networks We are developing an exome sequencing + knowledge-discovery pipeline to identify the genetic cause(s) of disease in a patient with multiple sclerosis. We are developing a novel concept to prioritise mutations by mining the database for genes associated with ‘surrogate’ or ‘secondary’ phenotypes links to disease (e.g. demyelination in multiple sclerosis). This includes direct human gene to phenotype information as well as transitive associations via model organism evidence (gene knockout phenotypes). The latter has the potential to identify rare mutations associated with a phenotype that would otherwise be missed and formally mapping between phenotypes and diseases in the semantic network will therefore be prioritised in 2012. Other disease cases will also be sequenced and a version of the method will be applied in a large-scale RNAseq project studying a rat model of post-traumatic stress disorder. Project 3: Statistical Methods for Cross-platform Microarray Analysis in Cancer A large volume of health research focused gene expression data exists in public repositories like and there is a significant opportunity to re-use microarray data in various combinations for novel in-silico analyses that would otherwise be too costly to perform. For example, thousands of cancer experiments, where the aim was to identify genes being differentially expressed in normal versus tumour tissue, are available. We have developed a method for combining and re-analysing large numbers of data sets that may have been generated on different technology platforms as a means to increase the statistical power of the meta-analysis, while weakening the effects of individual study-specific biases. We are applying this method to identify driver genes in the development and metastasis of breast cancer. In 2012 the method will be applied to machine learning based classification of tumours e.g. benign, malignant, metastatic drug resistant, based on gene expression signatures identified in the large merged datasets.
33
34
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Research Laboratories cont. PI: Dr Gordon Harkins My research primarily focuses on the evolution and molecular epidemiology of ssDNA and RNA viral pathogens of animals and plants. I am key member in a fledgling, but already highly productive, plant-virus epidemiology network seeking to determine the evolutionary underpinnings of the emergence and spread of the numerous novel geminiviral agricultural diseases that seriously threatening the food security of Africa and the rest of the developing world. Together with my collaborators I have been investigating nucleotide sequence data from a broad range of virus species to determine characteristic features of the population histories, evolution rates and migration patterns associated with the geminivirus emergence events that have recently been detected in Africa, South America and the Pacific Rim. Besides informing policy makers on the potential risks associated with relaxed controls on the movements of agricultural produce, this work will hopefully identify correlates of impending virus emergence that could form the basis of a much needed pandemic early warning systems (such as those which are currently in place for influenza A. A summary of some of the research projects that I have been involved in during 2010-2011 is presented below. The identification of biologically important secondary structures in single stranded RNA and DNA viral pathogens. Besides a capacity to store information within the sequences of their component nucleotides, the genomes of RNA viruses can also potentially store information within their folded secondary structures. RNA viral genomes often contain conserved secondary structures that play a vital role during the various stages of the viral life cycle influencing many biological processes such as genome replication, viral packaging, intracellular trafficking, gene expression and genetic recombination. While much is known about regulatory motifs in RNA at the 5’ and 3’ untranslated regions, most potential regulatory elements within RNA viral genomes likely remain uncharacterised. These regulatory motifs constitute an important component of the genetic code and as such indicate that much remains to be discovered by the analyses of singe stranded RNA/DNA viral genomes and intact messenger RNAs (mRNAs). Therefore, an efficient and accurate structure prediction methodology can give vital directions to experimental studies aiming to evaluate the function of these conserved secondary structure architectures. We have devised such a tool called NASP (Nucleic Acid Structure Prediction) that identifies evolutionarily conserved nucleic acid secondary structures sequences (Semegni et al. 2011), that takes as input a nucleotide sequence alignment and returns the most probable evolutionarily conserved consensus secondary structure. Downloadable and web-based versions of the software programme Nucleic Acid Structure Prediction (NASP) are freely available at http://web.cbio.uct.ac.za/~yves/nasp_portal.php http://web.cbio.uct.ac.za/~yves/nasp_ portal.php NASP: A Parallel Program for Identifying Evolutionarily Conserved Nucleic Acid Secondary Structures from Sequence Alignments Semegni et al. 2011 Bioinformatics 27: 2443-2445. The evolution and molecular epidemiology of Tomato yellow leaf curl virus (TYLCV) The ongoing global spread of Tomato yellow leaf curl virus (TYLCV; Genus Begomovirus, Family Geminiviridae) represents a serious looming threat to tomato production in all temperate parts of the world. Whereas determining where and when TYLCV movements have occurred could help curtail its spread and prevent future movements of related viruses, determining the consequences of past TYLCV movements could reveal the ecological and economic risks associated with similar viral invasions. Towards this end we applied Bayesian phylogeographic inference and recombination analyses to available TYLCV sequences (including those of 15 new Iranian full TYLCV genomes) and reconstructed a plausible history of TYLCV’s diversification and movements throughout the world. In agreement with historical accounts, our results suggest that the first TYLCVs most probably arose somewhere in the Middle East between the 1930s and 1950s (with 95% highest
A n n u a l R e p o rt 2 0 1 1
Research Laboratories cont. probability density intervals 1905–1972) and that the global spread of TYLCV only began in the 1980s after the evolution of the TYLCV-Mld and -IL strains. Despite the global distribution of TYLCV we found no convincing evidence anywhere other than the Middle East and the Western Mediterranean of epidemiologically relevant TYLCV variants arising through recombination. The Spread of Tomato Yellow Leaf Curl Virus (TYLCV) from the Middle East to the World Lefeuvre et al. 2010 PLoS Pathogens 6(10): e1001164 doi: 10.137/journal ppat. 1001164. Determining the long-term evolutionary rate of geminivirus integrons from Nicotiana genomes Whereas analyses of geminivirus substitution rates estimated using temporally structured datasets have indicated that these single stranded DNA viruses are evolving as fast as many animal and plant RNA viruses, it is still unknown when these viruses originated. Current hypotheses range from their having originated long before the evolution of flowering plants >130 MYA to their being only a few hundred thousand years old. A recently discovered geminivirus fossil within the genome of some Tobacco species indicates that relatively modern looking geminivirus-like viruses must have already been in existence between 0.2 and 9MYA. We are attempting to use the reconstructed ancestral sequences of the integrated geminivirus sequences to place upper and lower bounds on the date when geminiviruses originated. This project brings sophisticated molecular clock analyses of plant and virus sequences together with both geological data on continental drift, and paleontological fossil data on plant and insect evolution to reveal what will be the first concrete geological-time frame histories of a modern virus family. The Time-scale of Begomovirus Evolution: Evidence from Integrated Sequences in the Nicotiana genome Lefeuvre, et al. 2011 PLoS ONE 6(5): 2011. e19193. doi:10.1371/journal.pone.0019193. The historical spatial diffusion dynamics of Maize streak virus strain A (MSV-A) Maize streak virus strain A (MSV-A), the etiological agent of maize streak disease, represents one of the most serious biotic threats to African food security. Determining where MSV-A originated and how it spread transcontinentally could yield valuable insights into its historical emergence as a crop pathogen. Similarly, determining where the major extant MSV-A lineages arose could identify geographical hot spots of MSV evolution. We have used model-based phylogeographic analyses of 353 fully sequenced MSV-A isolates to reconstruct a plausible history of MSV-A movements over the past 150 years. We show that since the probable emergence of MSV-A in southern Africa around 1863, the virus spread transcontinentally at an average rate of 32.5 km/year (95% highest probability density interval, 15.6 to 51.6 km/year). Using distinctive patterns of nucleotide variation caused by 20 unique intra-MSV-A recombination events, we tentatively classified the MSV-A isolates into 24 easily discernible lineages. Despite many of these lineages displaying distinct geographical distributions, it is apparent that almost all have emerged within the past 4 decades from either southern or east-central Africa. Collectively, our results suggest that regular analysis of MSV-A genomes within these diversification hot spots could be used to monitor the emergence of future MSV-A lineages that could affect maize cultivation in Africa. Reconstructing the History of Maize Streak Virus Strain-A Dispersal to Reveal Diversification Hotspots and its Initial Origins in Southern Africa. Monjane et al. 2011 The Journal of Virology, September, Vol. 85, No. 18 p9623-9636. Reconstructing the evolutionary history of psittacine beak and feather disease Psittacine beak and feather disease (PBFD), is one of the most devastating emerging diseases affecting both wild and captive psittacine birds and poses a serious threat to the health of pet birds and the conservation of threatened species. First described in 1975 in various species of Australian cockatoos, the disease has since been reported in more than 60 psittacine species in eleven countries around the globe. Beak and feather disease virus (BFDV), family Circoviridae (genus Circovirus), has been identified as the etiological agent with surveys indicating that prevalence rates vary between 10% and 94% in both captive and wild psittacines. This study conducted by James Matthews is the first to fully exploit a recently published Bayesian
35
36
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Research Laboratories cont. phylogeographic inference method to better understand viral emergence and geographical dissemination of BFDV using full-genome data. Under this framework we will model spatial diffusion on time-measured genealogies as a continuous-time Markov chain over discrete sample locations. This temporal-spatial process will be simultaneously integrated with well-established models of sequence evolution in a Bayesian genealogical approach using the software package BEAST (Bayesian Evolutionary Analysis of Sampling Trees), allowing for the inference of historical spatial dynamics over time. Furthermore, we are evaluating a range of potential predictors of viral dissemination between pairwise countries as phylogeographic models and fitting these models individually to the BFDV whole genome sequence data These include (i) geographical distances between countries (ii) the number of psittacines transported annually from one country to another (with directionality), (iii) the psittacine population size in the country of origin, (iv) the psittacine population size in the country of destination, and (v) the product of the psittacine population sizes in the country of origin and the country of destination.
A n n u a l R e p o rt 2 0 1 1
Research Laboratories cont. PI: Dr Nicki Tiffin Introduction I work on human genetics underlying disease, specifically in African populations, aiming to characterise genetic diversity in South Africa patient populations within the disease context. I research generic computational disease gene prediction, candidate disease gene prioritisation for specific diseases, and genetics of host response to infectious disease. Ongoing projects include a collaborative project establishing a registry of patients from Cape Town who have systemic lupus erythematosus (SLE). We are building a database for effective storage and datamining of extensive clinical and biochemical patient data for these patients, and will use this data to design and implement – omic studies to further elucidate the genetic and environmental contributors to this disease. I also work with clinical collaborators to investigate genetic factors underlying susceptibility to salt-sensitive hypertension in South African patients. I continue research in the area of generic approaches to computational disease gene prioritisation, and I am completing research with the SYSCO Consortium investigating response of host macrophages to infection with Leishmania major, with our findings under preparation for publication. Research projects 1. Genetic factors underlying systemic lupus erythematosus (SLE) in South African patients SLE is a multi-systemic autoimmune disease with a broad range of clinical presentations, and high associated morbidity and mortality. The incidence and prevalence of SLE varies significantly in different ethnic groups and populations (1, 2), including the South African patient population (3). There is, however scant data on SLE in sub-Saharan Africa (4). We have established a comprehensive registry of SLE patients at Groote Schuur Hospital, Cape Town (the first 100 patients were recruited by December 2011) to aid better understanding of the occurrence, biochemical, clinical, diagnostic, prognostic, therapeutic and ‘quality of life’ features of SLE in South Africa, and to better provide appropriate treatment for South African patients. The registry is a resource for research into the genetics underlying SLE in South Africa, and DNA samples are being biobanked for future molecular research. Two research papers are under consideration for publication. Collaborators: Dr Ikechi Okpechi (MBBS, FWACP, Ph D) Dr Asgar Kalla, (MB ChB, FCP (SA), MD, FRCP (Lond).) Dr Ayanda Gcelu (MBChB, FCP (SA) MPH) 2. Genetic factors underlying salt-sensitive hypertension in South African patients This is an ongoing collaboration to investigate candidate genes for salt-sensitive hypertension, which presents with sustained elevation in blood pressure with no known underlying cause. The heritability of hypertension ranges from 30% to 60%, with variable clinical presentation and drug response (5, 6), and salt-sensitive hypertension appears to be more prevalent in people of indigenous African origin (7-9). We have previously identified candidate genes for salt-sensitive hypertension in Africans, and are applying next-generation sequencing methods to identify disease-associated variations in these genes in indigenous African patients and controls. Collaborators: Professor Brian Rayner (MBChB, FCP SA) Mr CJ Van Heerden (Central Analytical Facilities, DNA Sequencing Unit, Stellenbosch University) 3. Generic approaches to disease gene prioritisation We are developing a novel approach to enable disease gene prediction using the position of genes within the genome structure. This approach is the first of its kind to look at the frequency of recombination and proximity of recombination hotspots in relation to the likelihood of neighbouring genes to be implicated in disease.
37
38
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Research Laboratories cont. Epidemiological evidence has clearly and consistently shown that disease occurrence and genetics underlying disease can vary substantially between populations and ethnic groups (Via et al. 2009). We are investigating the prioritisation of disease genes in a population/ethnic-specific way given the haplotype block structure of a population. Masters Student: Ms Tracey Kibler, SANBI, UWC Research Output: Tiffin, N. Book Chapter: Methods in Molecular Biology: In Silico Tools for Gene Discovery. Chapter title: Conceptual thinking for prioritization of candidate disease genes. Methods Mol Biol. 2011;760:175-87. 4. Host genetics underlying response to Leishmania major Leishmaniasis is a severe disease caused by protozoan Leishmania parasites, transmitted by the bite of the sand fly. We are completing three years of studies by the SYSCO Consortium, funded by the European Union 6th framework. I work on computational aspects of the project, analysing gene expression array data from host macrophages. Manuscripts presenting the data from these studies are under preparation. Collaborators: The Sysco Consortium (http://asahttp.drim.com/syscoproject/) Dr Frank Brombacher and Dr Anita Schwegmann Outputs: Tiffin, N., Hofmann, O., The SysCo Consortium, Schwegmann, A., Brombacher, F., Hide, W. Analysis of differential gene expression and regulatory networks in Leishmania-infected macrophages from susceptible and resistant mouse strains. Poster: ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa 2011, Future direction I have been actively involved in the formation of the AfriCRAN Consortium, which is an African-wide collaboration aiming to elucidate genetic and environmental causes of craniofacial abnormalities in African populations; and I have joined research networks actively seeking funding for research into genetics of Alzheimer’s disease, pharmacogenomics in African populations, kidney disease in Africa, and the development of the African Bioinformatics Network. References: 1. Hopkinson, N. D., Doherty, M. & Powell, R. J. (1994) Ann Rheum Dis 53, 675-80. 2. Johnson, A. E., Gordon, C., Palmer, R. G. & Bacon, P. A. (1995) Arthritis Rheum 38, 551-8. 3. Okpechi, I. G., Rayner, B. L., van der Merwe, L., Mayosi, B. M., Adeyemo, A., Tiffin, N. & Ramesar, R. (2010) PLoS One 5, e9086. 4. Bae, S. C., Fraser, P. & Liang, M. H. (1998) Arthritis Rheum 41, 2091-9. 5. Shih, P. A. & O'Connor, D. T. (2008) Hypertension 51, 1456-64. 6. Lifton, R. P., Gharavi, A. G. & Geller, D. S. (2001) Cell 104, 545-56. 7. Weinberger, M. H. (1996) Hypertension 27, 481-90. 8. Sullivan, J. M., Prewitt, R. L. & Ratts, T. E. (1988) Am J Med Sci 295, 370-7. 9. Rayner, B. L., Myers, J. E., Opie, L. H., Trinder, Y. A. & Davidson, J. S. (2001) S Afr Med J 91, 594-9.
A n n u a l R e p o rt 2 0 1 1
Research Laboratories cont. PI: Prof Simon Travers Simon Travers is the principal investigator of the HIV molecular evolution research group. He graduated from his undergraduate degree in Biotechnology at the National University of Ireland, Maynooth in 2001 and completed his PhD (Bioinformatics) in 2004 also at NUI Maynooth. Following his PhD he undertook post-doctoral research with Dr Mario Fares in NUI Maynooth and Trinity College, Dublin. In late 2006 he received funding from the Irish Health Research Board (HRB) and established his research group initially in NUI Maynooth before moving to NUI Galway. He has been at SANBI since April 2010. His research focuses on the implementation of computational approaches to study various aspects of HIV evolution. He is particularly interested in the study of drug resistance in HIV and in more recent years this focus has shifted to using ultradeep sequencing approaches to characterise the entire spectrum of viral variants present within HIV infected individuals. In particular, he is interested in understanding the role of low abundance drug resistant variants on treatment outcome. Further research interests include using molecular phylogenetics to understand viral diversity and evolution, studying the molecular mechanisms driving coreceptor tropism switch in HIV as well as characterising N-linked glycosylation in HIV to further understand the therapeutic potential of N-linked glycans. The role of N-linked glycosylation in HIV As part of the post-translational processing of a HIV virion carbohydrates are added to the surface of the virion by the hosts glycosylation mechanism. The binding of such N-linked glycans conveys protection to a virions surface proteins by acting as a shield to avoid detection by the host's immune system. These carbohydrates, however, may comprise a novel target for HIV therapeutics and Natasha Wood (postdoctoral researcher) is currently studying the three-dimensional properties of this 'glycan shield' to further understand its therapeutic potential. Collaborators: Prof Robert Woods, NUI Galway, Ireland Dr Elisa Fadda, NUI Galway, Ireland. Dr Simon Lovell, University of Manchester, UK Development of 454 analysis pipelines Ram Krishna Shrestha (PhD student) is working on a project focused on the management and analysis of HIV-1 ultra-deep sequencing data. He is developing methods to process and analyse HIV sequence data generated using 454 sequencing technology for the detection of drug resistance. His project also focuses on analysis of 454 sequence data from individuals infected with HIV-1 subtype C. This data is being used to examine the effect of minor variant drug resistant viruses on the treatment outcome of individuals infected with HIV-1 subtype C. Collaborators: Dr Grace McCormack (NUI Galway, Ireland) The Karonga Prevention Study (Malawi) Prof Maria Papathanasopoulos (Wits Medical School) Outputs: A talk at the launch of the NICD 454 sequencing platform. A novel method for the quality control of 454 sequence data (paper submitted to Bioinformatics). The role and mechanisms of CXCR4-usage in HIV-1 subtype C. Saleema Crous (MSc student) is working on coreceptor usage in HIV-1 subtype C. Transmitted viruses mostly use the CCR5 chemokine receptor as a coreceptor for host cell entry. Using sequence data from viruses whose
39
40
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Research Laboratories cont. coreceptor preference is known this project aims to better understand the molecular mechanisms that lead to the change of coreceptor usage in HIV-1 subtype C. Outputs: Identified the optimal method for the prediction of CXCR4-usage in subtype C sequences (paper submitted). Molecular Phylodynamics of HIV focusing on the transmission of HIV drug resistance Using data collected from a number of HIV cohorts Fredrick Nindo (MSc student) is using molecular phylodynamic approaches to better understand the epidemiology of these cohorts. Combining sequence data with epidemiological information Fred is using phylogenetic approaches to model the extent of transmission networks of HIV within these populations. These clusters will be further correlated with the presence of drug resistant mutations to describe the levels of transmission of resistant virus between individuals. Collaborators: Dr Grace McCormack (NUI Galway, Ireland) The Karonga Prevention Study (Malawi)
A n n u a l R e p o rt 2 0 1 1
Research Collaborations Genetic factors underlying systemic lupus erythematosus (SLE) in South African patients. Nicki Tiffin collaborating with: Dr Ikechi Okpechi Dr Asgar Kalla
Department of Nephrology and Hypertension, UCT/Groote Schuur Hospital
Dr Ayanda Gcelu Nature and purpose: This collaboration has established a patient registry and database for SLE patients in Cape Town, for the purpose of clinical and genetic research into this disease. Patients are recruited to the registry on an ongoing basis from the clinics at Groote Schuur Hospital, patient DNA is biobanked for future research, and biochemical and clinical data are collected and databased at each patient visit. Output in the last 12 months: Research outputs for 2011 have included two papers submitted to peer-reviewed journals in November 2011. These are “Clinicopathological insights into lupus nephritis in South Africans: a study of 251 patients”, and “A diverse array of genetic, cellular and environmental factors converge in the pathogenesis of Systemic Lupus Erythematosus”. Future direction: We will be conducting a thorough analysis of the clinical and biochemical features of the patients who have been recruited to the patient registry thus far (110 patients, Feb 2012), and are constructing our relational database for effective data entry, storage and querying for patient data. We are actively seeking funding to undertake research into the genetics and molecular processes underlying lupus in these patients; and will also be performing computational analyses to predict candidate disease genes for lupus. Genetic factors underlying salt-sensitive hypertension in South African patients Nicki Tiffin collaborating with: Prof Brian Rayner
Department of Nephrology and Hypertension, UCT/Groote Schuur Hospital
Mr C. J. Van Heerden
Central Analytical Facilities, DNA Sequencing Unit, Stellenbosch University
Nature and purpose: This collaboration aims to elucidate the genetic factors underlying salt-sensitive hypertension in African patients. Output in the last 12 months: In 2011 we selected a group of patients with salt-sensitive hypertension, a group of patients with essential hypertension (not salt-sensitive), and a group of normotensive controls. In all about 300 samples have been submitted for next-generation sequencing approaches to identify variations in a primary candidate gene. Future direction: The data generated from the sequencing analysis will be used to identify variants in the PTH gene that might contribute to salt-sensitive hypertension in Cape Town patients. Host genetics underlying response to Leishmania major. Nicki Tiffin collaborating with: The Sysco Consortium
For the full members list see http://asahttp.drim.com/syscoproject/
Drs Frank Brombacher and Immunology and Infectious Diseases, ICGEB/UCT Anita Schwegmann
41
42
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Research Collaborations cont. Nature and purpose: This collaboration, funded by the European Union 6th framework, aims to elucidate the genetic factors and regulatory pathways that underlie the response of host macrophages to infection with L.major. Output in the last 12 months: Tiffin, N., Hofmann, O., The SysCo Consortium, Schwegmann, A., Brombacher, F., Hide, W. Analysis of differential gene expression and regulatory networks in Leishmania-infected macrophages from susceptible and resistant mouse strains. Poster: ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa 2011, Future direction: This project is completed and preparation of findings for publication is ongoing. Susceptibility to Mycobacterium tuberculosis infection in HIV-negative patients Nicki Tiffin collaborating with: Prof Eileen Hoal Van Helden
Biomedical Sciences, Stellenbosch University
Nature and purpose: This collaboration involves the computational analysis and databasing of genotyping data generated from M Tb infected patients and matched controls. Output in the last 12 months: Roetz, N., Möller, M. Tiffin, N., Christoffels A., Hoal, E. Poster: Detecting Copy Number Variants in Host Susceptibility to Tuberculosis using SNP Array. ISCB Africa ASBCB Conference on Bioinformatics, Cape Town, South Africa 2011 Genomic and proteomic determinants of Mycobacterium tuberculosis phenotypic characteristics Alan Christoffels and Junaid Gamieldien collaborating with: Prof NC Gey van Pittius Prof Rob Warren
Stellenbosh University
Nature and purpose: An analysis of the genomic, transcriptomic and proteomic variations giving rise to phenotypic characteristics in strains of Mycobacterium tuberculosis which enhance its ability to survive within its host and evade the host’s defense mechanisms. Output in the last 12 months: High throughput sequencing data for 8 TB genomes were received and assembled at SANBI together with preliminary annotations. PhD student, Alecia Naidu presented her computational pipeline at the international Bioinformatics Conference in Cape Town in March 2011. MSc student, Mmakamohelo Direko submitted her thesis in December 2011 where she assembled the M.oryx genome and identified SNPs that will be validated in 2012. Developed a web-resource to access the TB data. (www.sanbi.ac.za/tb_genomics) Future direction: The annotated TB genomes will be analysed during 2012 to validate the nucleotide variations and rearrangements in the TB genome and submitted for publication. Apply our protocols to a large number of newly sequenced TB strains. Training of students in the laboratory of our collaborator at Tygerberg Medical School.
A n n u a l R e p o rt 2 0 1 1
Research Collaborations cont. Development of an integration knowledge system for complex data Junaid Gamieldien collaborating with: Dr Veronique Vaslin Prof David Klatzman
Immunologie-Immunopathologie-Immunothérapie research institute, Paris
Dr Adrien Six Nature and purpose: The immunology institute in Paris has generated immunomics-type experimental data on a large scale but do not have adequate methods to store and manage this information. The SANBI/MRC unit will develop an integration database system that will be tailored to immunology-type datasets Output in the last 12 months: Veronique Vaslin visited SANBI to explore the details of the collaboration Junaid Gamieldien developed a prototype of the integration system that will be used for the French collaboration. Junaid presented his prototype in Paris during 2011 at the immunology institute. Future direction: Further development of the data integration system to support the immunology-type datasets. A shared PhD student is being recruited for 2012. International Glossina Genome Initiative (IGGI) Consortium Alan Christoffels collaborating with IGGI Consortium members, including: Serap Aksoy
Yale University, US
Dan Masiga
International Centre for Insect Physiology and Entymology, Kenya
Matt Berriman
Sanger Institute, UK
Loyce Okedi
National Livestock Health Research Institute, Tororo, Uganda
Mike Lehane
Liverpool School of Tropical Medicine, Liverpool, UK
Nature and purpose: To sequence the Glossina morsitans genome in order to rapidly provide an evaluation of the translational impact on eradication of the vector of sleeping sickness in Africa. (Present on all borders of South Africa.) Output in the last 12 months: Sequencing and assembly of the Glossina morsitans genome in October 2011. Multiple working groups in the cosortium assigned specific sections to write for the genome paper. Future direction: Analysing expression data to be compiled as satellite papers after the genome paper is published. Hosting a workshop in Kenya for genome annotation. RNA Secondary Structure Prediction Gordon Harkins collaborating with: Y. Semegni D. Martin
Institute of Infectious Diseases and Molecular Medicine, UCT
A. Varsani
School of Biological Science, University of Canterbury, New Zealand
M. Wamalwa
BecA-ILRI hub Nairobi, Kenya
43
44
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Research Collaborations cont. Nature and purpose: While much is known about regulatory motifs in RNA at the 5’ and 3’ untranslated regions, most potential regulatory elements within RNA viral genomes likely remain uncharacterised. These regulatory motifs constitute an important component of the genetic code and as such indicate that much remains to be discovered by the analyses of singe stranded RNA/DNA viral genomes and intact messenger RNAs (mRNAs). Therefore, an efficient and accurate structure prediction methodology can give vital directions to experimental studies aiming to evaluate the function of these conserved secondary structure architectures. We have devised such a tool called NASP (Nucleic Acid Structure Prediction) that identifies evolutionarily conserved nucleic acid secondary structures sequences (Semegni et al 2011), that takes as input a nucleotide sequence alignment and returns the most probable evolutionarily conserved consensus secondary structure. NASP provides statistical support for the folding predictions and the overall presence of secondary structure. By combining these predictions with co-variation, recombination and synonymous substitution rate analysis, conserved RNA secondary structures can be reliably identified. This allows us to test for a) evidence of purifying selection pressures acting upon synonymous sites within protein coding regions b) evidence that sites predicted to be paired within secondary structures are co-evolving and c) evidence that recombination that naturally occurs among virus genomes has tended to preserve the secondary structures more than would be expected if the observed recombination events were randomly distributed throughout the genome. Output in the last 12 months: Downloadable and web-based versions of the software programme Nucleic Acid Structure Prediction (NASP) are freely available at http://web.cbio.uct.ac.za/~yves/nasp_portal.php http://web.cbio.uct.ac.za/~yves/nasp_ portal.php A single publication: NASP: A Parallel Program for Identifying Evolutionarily Conserved Nucleic Acid Secondary Structures from Sequence Alignments Semegni et al. 2011 Bioinformatics 27: 2443-2445. Future direction: A master’s student is currently investigation the evolutionary conserved nucleic acid secondary structures identified in the positive-sense single-stranded RNA viral families Picornaviridae and Caliciviridae. Specifically, he is looking for a) evidence of purifying selection pressures acting upon synonymous sites within protein coding regions b) evidence that sites predicted to be paired within secondary structures are co-evolving and c) evidence that recombination that naturally occurs among virus genomes has tended to preserve the secondary structures more than would be expected if the observed recombination events were randomly distributed throughout the genome. Characterizing HIV-1 subtype C gp160 envelope sequence diversity in the female genital tract and plasma during acute and chronic infection Gordon Harkins collaborating with: L. Morris P. Moore
National Institute of Communicable Disease, Pretoria
B. Lambson M. Abrahams D. Martin
Institute of Infectious Diseases and Molecular Medicine, UCT
G. Bandawe P. Lemey
Department of Microbiology and Immunology, Leuven University, Belgium
S. Karim
CAPRISA, University of KwaZulu Natal, Durban
A n n u a l R e p o rt 2 0 1 1
Research Collaborations cont. Nature and purpose: We are investigating whether differences exist between human immunodeficiency viruses in the female genital tract and blood plasma through a longitudinal study of the sequence diversity in HIV-1 infected patients during chronic and acute infection. To date we have acquired HIV-1 envelope sequence data from sampling and sequencing work conducted by CAPRISA and the National Institute for Communicable Diseases (NICD). The data comprises a cohort of four HIV-1 positive females that have not been exposed to antiretroviral treatment (ART) throughout their participation in this study, with a total of 449 samples collected at time intervals ranging from 14 to 1316 days post sero-conversion. Using a hierarchical Bayesian statistical inference approach we have estimated the number of variants that each patient was infected with and evaluated the degree of viral compartmentalisation of HIV-1 subtype C viruses within the female genital tract and blood plasma. An improved understanding of viral evolution within the female genital tract during acute and chronic infection should contribute to the development of more effective treatments and prevention strategies to block or reduce heterosexual and perinatal transmission of HIV. Output in the last twelve months: A single paper is in preparation for submission to a peer-reviewed journal. Future direction: A comparative analysis of structured coalescent and hierarchical phylogenetic viral diffusion models is currently being performed and a single paper and at least three conference presentations are expected to result from this study. Geminivirus Collaborative Network Gordon Harkins collaborating with: D. Martin
Institute of Infectious Diseases and Molecular Medicine, UCT
D. Shepherd
Department of Molecular and Cell Biology, UCT
J. Khan
Department of Crop Sciences, Sultan Qaboos University, Oman
A. Varsani
Department of Biological Sciences, University of Canterbury, New Zealand
J. Brown
Plant Pathology Department, University of Arizona, USA
P. Lemey
Department of Microbiology and Immunology, Leuven University, Belgium
P. Rougmangac
CIRAD, Montpellier, France
J. Lett P. Lefeuvre
CIRAD-Universite´ de la Re´union, Isle de la Re´union
Nature and purpose: Pervasive food insecurity is a major determinant of health in sub-Saharan African countries where life-expectancy rates remain among the lowest in the world and where malnutrition ranks among the greatest causes of ill health. While the causes for this situation are undoubtedly multifactorial, crop losses due to geminiviral disease remain high on the African continent seriously undermining both the food and economic security of the over 300 million sub-Saharan Africans that are dependent of subsistence farming. The SANBI viral pathogen genetics team forms part of a pan-African network of crop scientists and virologists that are conducting the world’s largest ongoing plant pathogenic virus diversity studies. The primary focus of this collaborative group is the comparative analysis of cassava mosaic virus (CMV), tomato leaf curl virus (TYLCV) and maize streak virus (MSV) disease transmission dynamics throughout Africa and the world. We are conducting cutting-edge epidemiological research and that has identified the predominant CMV, TYLCV and MSV genotypes that will confront resistant transgenic cultivars in different parts of the continent and the world and has determined the historical migration pathways, movement rates and heterogeneity in spatiotemporal spread across Africa and the world of these viruses. The virus movement rate estimates and the movement pathways identified in this
45
46
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Research Collaborations cont. study will be important parameters in future disease forecasting efforts that could directly benefit hundreds of millions of small scale farmers throughout Africa. Output in the last 12 months: 2 publications The Time-scale of Begomovirus Evolution: Evidence from Integrated Sequences in the Nicotiana genome Lefeuvre, et al. 2011 PLoS ONE 6(5): 2011. e19193. doi:10.1371/journal.pone.0019193. Reconstructing the History of Maize Streak Virus Strain-A Dispersal to Reveal Diversification Hotspots and its Initial Origins in Southern Africa Gordon Harkins collaborating with (et al): D. Martin
Institute of Infectious Diseases and Molecular Medicine, UCT
A. Varsani
School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
P. Lefeuvre
CIRAD-Universite´ de la Re´union, Isle de la Re´union
P. Lemey
Department of Microbiology and Immunology, Leuven University, Belgium
Nature and purpose: The A strain of maize streak virus (MSV-A) seriously threatens food security in sub-Saharan Africa. We have generated whole genome MSV-A sequences and, in combination with sampling coordinates and dates, inferred that the virus originated in southern Africa around 1850 and spread across the continent at a rate of approximately 30 km per year. Strikingly, all major contemporary MSV-A lineages arose within the past 50 years from just two well-defined diversification hotspots in south and east Africa. This discovery could dramatically simplify future efforts to monitor the emergence of epidemiologically relevant MSV-A variants. Output in the last 12 months: Monjane et al. 2011 The Journal of Virology, September, Vol. 85, No. 18 p9623-9636. Journal cover and spotlight section of The Journal of Virology. Future direction: Gaining an improved understanding how the virulence of these crop pathogens has changed since their initial emergence as serious agricultural pests in the 17th and 18th centuries and since the widespread growth of resistant (both conventionally bred and transgenic) improved crop genotypes. ENSEMBL fungal computational framework Alan Christoffels collaborating with: Jasper Rees
Agricultural Research Council
Dan Lawson
European Bioinformatics Institute, UK
Nature and purpose: Next generation sequencing technology was used to sequence the fugus, Venturia inaequalis, that infects apples. Together with data from the host, this large-scale data can provide insight into the genetic basis for the fungal interaction with the apple plant. Using the ENSEMBL opensource computational framework, we are developing a system for storing and mining genomic data generated in South Africa. Output in the last 12 months: We have implemented a local version of ENSEMBL at SANBI and have carried out an initial assembly of the Venturia genome. The Venturia genome was preliminary annotated using a newly implemented method.
A n n u a l R e p o rt 2 0 1 1
Research Collaborations cont. One of our MSc students has spent one month at the EBI in Cambridge as part of the skills transfer to SANBI. Future direction: The fungal annotations will be integrated with existing fungal genome data in ENSEMBL. Development of a method to filter next generation sequencing data for improving genome and transcriptome assembly. Computational discovery of carotenoid pathway regulatory networks Alan Christoffels collaborating with: E. Wurtzel
Lehman College, City University of New York
Nature and purpose: Vitamin A deficiency is associated with the consumption of food crops that are poor sources of provitamin A. However, the incomplete understanding of the regulatory pathway at the systems level, is a limiting factor to predictably control carotenoid content and composition in cultivars grown around the world. In the project we aim to discover transcriptional regulatory mechanisms controlling plant carotenogenesis. Output in the last 12 months: Produced an inventory of genes responding to environmental stress and implicated in the carotenogenesis pathway. Identified co-regulated carotenogenesis genes. Future direction: Prepare a manuscript for publication. Understanding the molecular mechanisms behind resistance to CCR5 antagonists. Simon Travers collaborating with: David Robertson Simon Lovell Grace McCormack
University of Manchester, UK National University of Ireland Galway, Ireland
Pfizer Global Research and Development, Sandwich, Kent, UK Nature and purpose: Use data from the Phase III clinical trials of Pfizer’s CCR5-antagonist maraviroc to understand the viral mechanisms of resistance to CCR5-antagonists. Outputs in the last 12 months: Two completed manuscripts which are currently on hold by Pfizer for confidentiality reasons. Graduation of a PhD student (Conor Meehan, NUI Galway). Future direction: Continue analysing the clinical trials data and to use subsequent sequence data obtained from Pfizer to develop sensitive genotypic methods to predict the coreceptor usage of an individual’s viral population with the intention of predicting the potential for resistance prior to CCR5 antagonist therapy initiation. HIV genotypic analysis as part of the Karonga Prevention Study (KPS) Malawi. Simon Travers collaborating with: Grace McCormack
National University of Ireland Galway, Ireland
The London School of Hygiene and Tropical Medicine, UK KPS, Chilumba, Malawi
47
48
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Research Collaborations cont. Nature and purpose: Characterisation and molecular epidemiology of HIV in Karonga District in Northern Malawi. Outputs in the last 12 months: Paper published in AIDS Research and Human Reroviruses documenting the prevalence of viral drug resistant mutations in treatment nieve individuals. Paper published in AIDS Research and Human Retroviruses studying the viral factors associated with long-term survival of HIV infected individuals. Successful implementation of 454 ultra-deep sequencing of 15 samples from 5 individuals documenting the prevalence and emergence of minor variant drug resistant virions (Manuscript in prep). Future direction: Further studies of the prevalence and emergence of drug resistance in individuals in response to antiretroviral therapy. Characterisation of novel recombination strains and study of the viral factors for long term survival in a number of individuals identified in the cohort. Usage of sequence data and biological samples to characterise the emergence of CXCR4-usage in subtype C infected individuals. Use sequence and geographical data to study transmission networks in the cohort. Understanding HIV-1 drug resistance using 454 ultra-deep pyrosequencing Simon Travers collaborating with: Gert van Zyl
NHLS Tygerberg and Stellenbosch University
Nature and purpose: Using 454 sequencing to characterise low abundance drug resistant viral variants in individuals failing first and second line therapy. Outputs in the last 12 months: Two manuscripts. One currently under second review in Journal of Virology and another for submission to Journal of Virology in early 2012. Future direction: Continue to study ultra-deep sequence data and HIV drug resistance. Towards cost-effective HIV drug resistance testing. Simon Travers collaborating with: Prof Wendy Stevens Dr Gillian Hunt Dr Leigh Berrie Prof Maria Papathanasopoulos
Head, Department of Molecular Medicine and Haematology, National Priority Program Centre for HIV / STI, National Institute for Communicable Diseases coDirector, Genotyping Laboratory, Department of Molecular Medicine and Haematology
Nature and purpose: Explore the use of 454 sequencing to develop a cost-effective, high-throughput approach for HIV drug resistance testing. Outputs in the last 12 months: 454 sequencing performed from 642 samples from patients with known treatment outcome. Computational pipeline developed for sequence data management and analysis. Manuscript in reparation for submission to New England Journal of Medicine.
A n n u a l R e p o rt 2 0 1 1
Research Collaborations cont. Future direction: Use the data acquired in the first phase of the project to determine the clinical relevance of using 454 sequencing for HIV drug resistance testing. Based on the above results implement a high-throughput pipeline from blood sample to result. Host-pathogen interactions in Sleeping sickness Alan Christoffels collaborating with: Prof Mark Field
Cambridge University, UK
Prof Henry Nyongesa
Computer Science Department, UWC
Nature and purpose: The flagellar pocket of T.brucei represents a location where trypanosomes and human immune proteins interact. Furthermore, trypanosome-human protein complexes are ingested at the flagellar pocket and are trafficked via an elaborate transport system to vesicles where they are degraded. The limited human-trypanosome protein interaction data led us to use machine-learning approaches to identify key protein interactions in collaboration with Prof Nyongesa. Proteins such as Rab have been extensively studied in the trafficking process through the trypanosome unlike SNARES. Besides the PPI predictions, we computationally identify the spectrum of SNARES in trypanosomes coupled with cell localization assays to confirm the cellular location of these SNARES in collaboration with Prof Mark Field. Outputs in the last 12 months: A PhD student, Edwin Murungi computationally identified 24 SNARES in T. brucei. He then spent two months in Cambridge carrying out cell localization assays on 4 SNARE proteins predicted for the typanosome SNARE repertoire. Using a dataset of Trypanosome flagella proteins and human immune proteins, Edwin predicted human-trypanosome protein interaction networks. Future direction: Two manuscripts are being prepared for publication. Other machine learning techniques will be assessed to improve algorithm performance in the prediction of protein-protein interactions. Characterisation of miRNAs in A.funestus Alan Christoffels collaborating with: Prof Lizette Koekemoer
National Institute of Communicable Diseases
Nature and purpose: miRNAs have been shown to place a regulatory role in fine-tuning gene expression. Majority of mosquito miRNAs were identified in A. gambaie while no miRNAs have been identified in A. funestus, an important vector on the African continent. Using next generation sequencing technology, we aim to identify miRNAs in A. funestus and predict the miRNA targets. Outputs in the last 12 months: Small RNAs were isolated from A. funestus and sequenced using illumina technology. These small RNAs were screened computationally for miRNAs and classified into various categories. miRNA targets were identified using three algorithms and filtered using gene enrichment analysis. Data have been shared via a web portal (insectar.sanbi.ac.za) Future direction: A number of insect genomes have been sequenced in the past 2 years and include a few blood-feeding vectors. We will compare miRNAs in these disease vectors to identify key mechanisms for parasitic control.
49
50
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Research Collaborations cont. Human Genetic Susceptibility to Tuberculosis Alan Christoffels collaborating with: Eileen Hoal van Helden
Stellenbosch University
Peter Witbooi
Mathematics Department, UWC
Nature and purpose: Intra-species protein-protein interaction predictions (PPI) have been attempted for a range of organisms even through the false positive rate remains high. In this project we are attempting to use supervised and unsupervised algorithms to predict the interactions between human and mycobacterium proteins. Outputs in the last 12 months: The limited experimental PPI for human-mycobacteria interactions has shifted our strategy to the application of Bayesian techniques to measure the interactions between human and M. tuberculosis Future direction: With new experimental data on the horizon, we will be exploring clustering techniques to enrich for human-mycobacterium PPI.
A n n u a l R e p o rt 2 0 1 1
Financials
The total funding secured at SANBI was R10 380 585.91 for the year 2011. 63% of SANBI funding was secured from South African2011 agencies, 24% from UWC and 13% from foreign donors. The South African ANBI ANNUAL SANBI ANNUAL REPORT REPORT 2011 SANBI ANNUAL REPORT Research Chair programme provided2011 the largest portion at 41%, followed by UWC 28%, SA MRC 18% and other NRF projects totaled 13%.
NANCIALS FINANCIALS FINANCIALS Oftotal the total income, salaries accounted for R10 42%, bursaries for with 21% running costs e totalThe funding funding secured secured at SANBI at was SANBI R10was 380 585.91 380 for 585.91 theaccounted year for the 2011 year 2011 aand consistent with a consistent 77%accounted 77% The funding secured R10 380 forfrom theAfrican year 2011 withagencies, a24% consistent erageaverage expense expense tototal income. to income. 63% of SANBI 63%atofSANBI funding SANBIwas funding was secured was 585.91 secured from South South agencies, African 24%77% for 37%. average expense to income. 63% of SANBI funding was secured from South African agencies, m UWC from and UWC 13%and from 13% foreign from donors. foreign donors. The South TheAfrican South Research African Research Chair programme Chair programme providedprovided the the 24% from UWC and 13% from foreign donors. The South African Research Chair programme provided the gest portion largest at portion 41%, followed 41%,funding followed by UWC by 28%, UWC MRC 28%, 18% MRC and18% other and NRF other projects NRF projects totaled 13%. totaled 13%. A number of at exciting applications were submitted during 2011 that included (1) NIH call for funding largest portion at 41%, followed by UWC 28%, MRC 18% and other NRF projects totaled 13%. on human, hereditary and health, (2) a national application to the Department of Science and Technology,
O WE DO WANT WE TO WANT TO ANYTHING ADD ANYTHING THE CHAIR THE RENEWAL CHAIR OTHER ANDapplication OTHER GRANTfor GRANT SA for theADD establishment of a ABOUT SouthernABOUT African human genomeRENEWAL andAND (3) Renewal the DST/NRF DO WE WANT TO ADD ANYTHING ABOUT RENEWAL PPLICATIONS APPLICATIONS THAT WERE THAT COMPILED WERE COMPILED IN 2011 IN Ð AP, 2011 NIH/H3 Ð THE AP, CHAIR NIH/H3 AFRICA AFRICA , , AND OTHER GRANT Research Chair in Bioinformatics and Public Health Genomics. APPLICATIONS THAT WERE COMPILED IN 2011 Ð AP, NIH/H3 AFRICA , Income and Expenditure trends 2004 – 2011:
comeIncome and Expenditure and Expenditure trends 2004 trends Ð 2011: 2004 Ð 2011: Income and Expenditure trends 2004 Ð 2011:
Distribution income from all sources: Distribution income from SA sources: stribution Distribution of income ofof income from allfrom sources: all sources: Distribution Distribution ofofincome of income from SAfrom sources: SA sources: Distribution of income from all sources: Distribution of income from SA sources:
the total Of the income, total income, salariessalaries accounted accounted for 42%,for bursaries 42%, bursaries accounted accounted for 21%for and21% running and running costs costs Of the counted accounted for 37%. for total 37%.income, salaries accounted for 42%, bursaries accounted for 21% and running costs accounted for 37%.
tailedDetailed expense expense report for report SANBI: for SANBI: Detailed expense report for SANBI:
nder Funder Salaries Salaries PostdocsPostdocs DoctoralsDoctorals Masters Printing MastersRunning PrintingInternet RunningTelecoms Internet Telecoms Travel Overheads Travel Overheads Equipment Equipment Total Total Funder Salaries Postdocs Doctorals Masters Printing Running Internet Telecoms Travel Overheads Equipment Total
51
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e 52
Financials cont.
Salaries
Total
Travel
108,781
Overheads Equipment
Postdocs Doctorals Masters Printing Running Internet Telecoms
71,916 1,648,432
146,618
97,683
17,005
5,970
26,275
194,123
609,163
5,576
10,423
53,978
31,701
125,500
12,466
89,342
71,713 3,220,427
15,851
1,001,150
66,732
2,995
650
18,916
15,056
4,676
54,000
312
10,638
101,918
2,349
427,336
51,188
1,778,708 9,101,524
342,104
1,595,690 1,595,690
317,784
468,488
51,128
68,704
40,000
30,830
34,984
9,080
90,811
757,272
91,279
90,967
295,000
730
72,241
50,000
398,978
157,256
81,602
565,000
864,500
80,000
490,000
160,000
680,830
5,970
66,082
2011 Detailed expense report for SANBI (ZAR): Funder 1,282,965
187,863
SA MRC Atlantic Philanthropies
-
3,826,002
200,165
1,153,187
27,260
974,563
World Health Organisation Dean’s Budget NRF Thuthuka NRF Blue Skies NRF Research Chair NRF Vitamin A NRF ENSEMBL DVC Capital Centre for Diseases Control TOTAL
A n n u a l R e p o rt 2 0 1 1
SANBI 2011 End-Of-Year-Party At the end of yet another productive year, SANBI staff and students spent an enjoyable day at Ratanga Junction. The day was filled with team-building activities and a lunch was enjoyed by all. A range of awards were handed out at this event and everyone looked forward to a well-deserved break.
53
54
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
ALUMNI Staff: Name
Institution
Winston Hide
Associate Fellow Ludwig Institute for Cancer Research Affiliate Faculty Harvard Stem Cell Institute Associate Professor of Computational Biology and Bioinformatics Department of Biostatistics Harvard School of Public Health
Vladimir Bajic
Director & Professor: Computational Bioscience Research Center, King Abdullah University of Science and Technology
Heikki Lehvaslaiho
Senior Research Scientist: Computational Bioscience Research Centre, King Abdullah University of Science and Technology
Tulio de Oliviera
Senior Bioinformatics Researcher: Africa Centre for Health and Population Studies, University of KwaZulu-Natal
Nicky Mulder
Group Head: Computational Biology Group, University of Cape Town
Cathal Seoighe
Stokes Professor of Bi oinformati cs: School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway
Postdoctoral Fellows: Name
Level of study
Date completed
Currently
Soraya Bardien-Kruger
PostDoc
Jun-02
University of Stellenbosch
Vladimir Babenko
PostDoc
Jun-02
Senior Staff Scientist, IC&G
Janet Kelso
PostDoc
Oct-04
Max Planck Institute for Evolutionary Anthropology
Raphael Isokpehi
PostDoc
Dec-04
Director of the Center for Bioinformatics & Computational Biology at Jackson State University
Konrad Scheffler
PostDoc
Feb-05
Theodore Gildred Research Facility, University of California, San Diego
Nicki Tiffin
PostDoc
Dec-05
Senior Lecturer, SANBI, UWC
Gwen Koning
PostDoc
Dec-06
Global Seed Core Manager – Syngenta Crop Protein AG, Basel, Switzerland
Chris Maher
PostDoc
Dec-07
Assistant Professor, Washington University School of Medicine
James Patterson
PostDoc
Jun-09
Adam Dawe
PostDoc
Aug-09
Research Scientist, KAUST
Sunil Sagar
PostDoc
Aug-09
Research Scientist, KAUST
Mandeep Kaur
PostDoc
Aug-09
Research Scientist, KAUST
Stuart Meier
PostDoc
Aug-09
Research Scientist, KAUST
Adele Kruger
PostDoc
Feb-10
Wayne State University, Detroit, Michigan
Oliver Hofmann
PostDoc
Feb-10
Affiliated Faculty, Harvard Stemcell Institute, Associate Director at Harvard School of Public Health
Sundarajan Seshadri
PostDoc
Nov-10
Nanyang Technological University, Singapore
Ashley Pretorius
PostDoc
Dec-10
Senior Lecturer, Biotechnology, UWC
Jacob Tsotetsi
PostDoc
Dec-11
PhD: Name
Level of study
Date completed
Currently
Alan Christoffels
PhD
2001
Interim Director, SANBI, UWC; NRF Research Chairholder
Ekow Oppon
PhD
2002
SA MRC
Junaid Gamieldien
PhD
2002
Senior Lecturer, SANBI, UWC
A n n u a l R e p o rt 2 0 1 1
ALUMNI cont. Zhuo Zhang
PhD
2007
Research Scientist, University of Singapore
Allen Chong
PhD
2009
Research Fellow, Beth Israel Deaconess Medical Center, Harvard Medical School
Magbubah Essack
PhD
Sep-09
Research Scientist, KAUST
Sebastian Schmeier
PhD
Sep-09
Research Scientist, KAUST
Ulf Schaefer
PhD
Sep-09
Research Scientist, KAUST
Mark Wamalwa
PhD
Sep-11
International Livestock Research Institute, Kenya
Musa Gabere
PhD
Sep-11
University of Namibia, Mathematics Department
Samuel Kwofie
PhD
Sep-11
UWC Postgraduate Office
2010
Research Scientist, KAUST
Aleksander Radovanovic PhD
MSc: Name
Level of study
Date completed Currently
Bukiwe Lupindo
MSc
2005
SA Government Administration
Cameron MacPherson
PhD
2009
PhD, KAUST
Tzu-Ming Chern
MSc
Mar-03
Switzerland, IT
Elana Ernstoff
MSc
Dec-03
Estienne Swart
MSc
Dec-03
Graduate Student, Princeton University
Victoria Nembaware
MSc
Dec-03
Post doc, UCT
Zayed Albertyn
MSc
Dec-03
Bioinformatics Director, Malaysia
Anelda Boardman
MSc
Mar-04
Stellenbosch University, Sequencing Facility Manager
Faisel Mosoval
MSc
Mar-05
Senior Professional Officer, Information Systems and Technology, Business Applications, Business Intelligence and Spatial Development, City of Cape Town
Nothemba Gwija-Kula
MSc
Mar-05
MRC
Farahnaz Ketwaroo
MSc
Dec-05
PhD, UCT
Mario Jonas
MSc
Mar-06
Web Administrator, SANBI, UWC
Oliver Bezuidt
MSc
Dec-07
PhD, University of Pretoria
Eugene Duvenhage
MSc
Mar-09
Software Developer, Corporate
Frederick Kamanu
MSc
Mar-09
PhD, KAUST
Feziwe Mpondo
MSc
Sep-09
MRC, Research Scientist
Saleem Adam
MSc
Sep-11
Firdous Khan
MSc
Mar-12
PhD, UWC Biotechnology Department
Honours: Name
Level of study
Date completed
Clifford Omorogie
Hons
Dec-01
Grant Carelse
Hons
Dec-02
Thurayah Davids
Hons
Dec-05
Halimit Ebrahim
Hons
Dec-09
Katlego Motlhatlego
Hons
Mar-12
Siyanda Tsaba
Hons
Mar-12
Stacey Moses
Hons
Mar-12
Currently
MSc, UWC Biotechnology Department
MSc, UWC Biotechnology Department
55
56
S ANBI So uth Af r ic a n Na t iona l Bioinf or m a t ic s I n st i t u t e
Notes
SANBI | South African National Bioinformatics Institute
Postal Address: South African National Bioinformatics Institute | University of the Western Cape | Private Bag X17 | Bellville | 7535 Physical Address: South African National Bioinformatics Institute | 5th Floor | Life Sciences Building | University of the Western Cape Modderdam Road | Bellville | 7530 | South Africa Telephone: +27 (0)21 959-3645 | Facsimile: +27 (0)21 959-2512 | Mailing List:
[email protected]
Email:
[email protected] |
Website: www.sanbi.ac.za