G 217 4153/09

Programme Syllabus

Master in Language Technology (One year or Two years) 60/120 higher education credits (Språkteknologi, masterprogram)

H2MLT Second cycle

1

1 Adoption The syllabus for the programme Master in Language Technology was approved by the Board of the Faculty of IT on 2009-10-07 and adopted by the Board of the Faculty of Arts on 2009-10-08 with effect from Autumn 2010. Field of education: Humanities and Information Technology Department responsible: Department of Philosophy, Linguistics and Theory of Science in cooperation with Department of Swedish Language, Department of Computer Science and Engineering and Department of Applied Information Technology.

2 Aim The aim is to provide students with an appropriate background with advanced knowledge in natural language technology enabling them to pursue a specialist career in industry or academic research.

3 Learning outcomes After completing the programme students will: 1. have advanced knowledge of current technologies used in natural language processing and speech technology 2. have advanced knowledge of theories and methods which are applied in current natural language technologies 3. have sufficient programming skills in order to develop language technology system components 4. have experience of project work as part of a research and/or development team either through an industrial placement or through placement in one of the laboratories conducting language technology research at the University of Gothenburg 5. have an appreciation of the human factor and social issues relating to the development and deployment of language technology 6. have an appreciation of the relationship between language technology and other key technologies such as world wide web and mobile systems technologies

4 Course content First year Semester 1: Either Introduction for linguists or Introduction for computer scientists (15 hecr) Programming for NLP (7.5 hecr) Natural Language Processing (7.5 hecr) Semester 2: Speech Technology (7.5 hecr)

2

Statistical methods for NLP (7.5 hecr) Masters Project (15 hecr, for students taking a one year degree) Computational syntax (7.5 hecr, for students taking a two year degree) Computational semantics (7.5 hecr, for students taking a two year degree) Second year Semester 1: Four courses to be drawn from offerings available from the contributing departments (Department of Philosophy, Linguistics and Theory of Science, Department of Swedish Language, Department of Computer Science and Engineering and Department of Applied Information Technology to be agreed in consultation with the programme advisor. Depending on availability and interest options may include the following courses among others (7.5 hecr): Parsing algorithms Dialogue systems Machine learning in NLP Language technology resources Information retrieval and extraction Knowledge representation and inference in addition to course offerings associated with other degree programmes in the contributing departments. Semester 2: Masters Project (30 hecr). Students will conduct their project as part of a team with placement either in industry or in one of the language technology laboratories at the University of Gothenburg associated with the university’s focus area on language technology. Course descriptions Introduction for linguists, 15 hecr Introduktion för lingvister This course will be oriented to students who have a linguistic background but no background in computer science. It will give an intensive introduction to those aspects of computer science which are important for language technology including datastructures, algorithms, programming paradigms, databases and artificial intelligence. It will also include an introduction to programming suitable for students who have not programmed before. Introduction for computer scientists, 15 hecr Introduktion för datavetare This course will be oriented to students who have some computational background but no background in linguistics. It will provide an intensive introduction to the linguistic study of language including relevant aspect of the main subfields of linguistics such as phonetics, phonology, morphology, syntax, semantics and pragmatics. Programming for NLP, 7.5 hecr Programmering för NLP This course will provide a practical hands-on introduction to programming for natural language processing using a language which has practical use both in

3

academic research and in industry. The course will provide a practical conclusion to the two introductions, bringing together the two groups of students. Natural Language Processing, 7.5 hecr Komputationell språkbehandling (NLP) The course is divided into four main topics, each covering a subfield of natural language processing: 1. Words: • Finite-state morphology • Statistical language modeling (n-grams) 2. Syntax: • Part-of-speech tagging • Syntactic parsing 3. Semantics: • Compositional semantic analysis • Word sense disambiguation 4. Pragmatics: • Discourse processing • Natural language generation • Machine translation Speech Technology, 7.5 hecr Talteknologi The course is intended for both students with a limited knowledge of the field and for students with a more extensive background in speech technology. The students will be expected to take a more active part in the discussion of current research. Introductory lectures will give an overview of the field with an emphasis on basic concepts and standard methods. Specific topics include • acoustic phonetics • multimodal speech synthesis • speech recognition • speaker recognition • dialogue systems • phonetic analysis Statistical methods for NLP, 7.5 hecr Statistiska metoder för NPL The purpose of this course is to give a research-oriented introduction to probabilistic modeling, statistical methods and their use within the field of language technology. The following topics will be covered in the course: • Theory: • Probability theory • Information theory • Statistical theory (sampling, estimation, hypothesis testing) • Applications:

4

• • • • • • •

Speech recognition Language modeling Part-of-speech tagging Syntactic parsing Word sense disambiguation Machine translation Evaluation

Masters Project, 15 hecr, for students taking a one year degree Självständigt arbete på magisternivå Students will be associated with a research or development team with a place either in industry or in one of the laboratories engaged in language technology research at the University of Gothenburg. The project will normally include both a written report and an implementation. Computational syntax, 7.5 hecr, for students taking a two year degree Komputationell syntax The course deals with central methods and techniques in the development of formal grammatical approaches to natural language, such as: • phrase-structure grammar • categorial grammar • feature-based grammar formalisms It also provides an overview of syntactic constructions and how they are to be described and implemented in a formal grammar.

Computational semantics, 7.5 hecr, for students taking a two year degree Komputationell semantik The course gives a basic introduction to model theoretical semantics for natural language (as developed for example in Montague semantics and Discourse Representation Theory) and its implementation using logic and/or functional programming techniques. It also introduces theorem proving and its application to reasoning in natural language applications. Parsing algorithms, 7.5 hecr Parsingalgoritmer The course deals with: 1. basic concepts in syntactic parsing 2. algorithms associated with finite-state methods, context-free grammars and feature-based grammars 3. parsing strategies such as top-down, bottom-up, stack-based and chart-based 4. the implementation of these strategies Dialogue systems, 7.5 hecr Dialogsystem

5

The course treats the theoretical foundations for dialogue systems and their construction and gives an overview over currently existing systems and their functionality. Practical aspects of the course will use dialogue systems and development tools that have been developed in our lab and elsewhere and students will build working end-to-end systems during the course. Machine learning in NLP, 7.5 hecr Maskininlärning för NLP The course deals with: 1. basic methodology including training and test data, evaluation methods 2. overview of machine learning paradigms such as statistical methods, decision trees, memory-based learning, transformation-based learning and inductive logic programming 3. application of these paradigms to language technology components such as part of speech tagging, parsing and word meaning disambiguation. Language technology resources, 7.5 hecr Språkteknologiska resurser The focus of the course is on linguistic data resources: 1. corpus resources (text corpora of written or spoken language, speech databases, digitized video, etc.) 2. lexical resources 3. grammatical resources (these are the most difficult to treat separately from the tools for using them) Students will do practical work with a number of such resources.

Information retrieval and extraction, 7.5 hecr Informationssökning och informationsextraktion The course gives a basic introduction to concepts, methods and algorithms for information retrieval and extractions. It focuses on a freely available platform and how this can be employed for a number of tasks and in particular how language technology of various kinds can be used to improve the system. Knowledge representation and inference, 7.5 hecr Kunskapspresentation och inferens The course deals with knowledge representation and ontologies and their theoretical background and provides an understanding of how ontologies can be used in language technology applications. The course also provides training in the use of tools and standards in such applications. Emphasis is placed on the practical use of tools and methods in individual or group work. Masters Project, 30 hecr, for students taking a two year degree Självständigt arbete på masternivå Students will be associated with a research or development team with a place either in industry or in one of the laboratories engaged in language technology research at the University of Gothenburg. The project will normally include both a written report and an implementation.

6

5 Entrance prerequisites Students with an undergraduate degree (at least three years full-time study) in language technology, computational linguistics, computer science or linguistics (with at least 30 hecr, corresponding to half a year full-time study, of study in formal linguistics) are eligible to apply for this programme. Students with an undergraduate degree in cognitive science, languages, philosophy, software engineering, information technology or mathematics can also be considered provided they can show a background in either programming or formal linguistics corresponding to 30 hecr, half a year full-time study.

6 Examination The examination of courses will be a mixture of traditional sit-down examinations, course papers, practical lab exercises and programming projects. There is a 15 hecr project requirement for the one-year degree and a 30 hecr project requirement for the two-year degree. Projects will normally result in an implementation and a report.

7 Additional information The programme’s director of studies is responsible for ensuring that students’ views of the course are collected and that the evaluation results are considered in future course design.

7