Radio frequency interference mitigation for software telescopes

VU University Amsterdam Faculty of Sciences Stichting Astron Research & Development University of Warsaw Faculty of Mathematics, Computer Science ...

Author: Oswin George

0 downloads 0 Views 3MB Size

Report

Download PDF

Recommend Documents

Thank You! Radio frequency interference

Radio frequency interference in radio astronomy

Radio Frequency Interference Environmental Monitoring Station (EMS)

Solving Your Radio Frequency Interference Problems

FOR RADIO FREQUENCY IDENTIFICATION

Notices Federal Communications Commission Radio Frequency Interference Statement

RADIO FREQUENCY INTERFERENCE (RFI) INVESTIGATION AND RESOLUTION GUIDE

THE RADIO FREQUENCY INTERFERENCE ENVIRONMENT AT CANDIDATE SKA SITES

Studies of Radio Frequency Interference Detection Methods in Microwave Radiometry

FCC-B Radio Frequency Interference Statement. Notice 1. Notice 2

Radio Frequency Spectrum Regulation for Radio Amateurs

Beam Calibration of Radio Telescopes with Drones

Evaluating Radio Frequency Interference Detection Algorithms for. SMAP (Soil Moisture Active Passive)

Reference antenna techniques for canceling radio frequency interference due to moving sources

Measurement of Radio Frequency Interference from High Voltage Substations Backgrounds and Considerations for Future Emission Requirements

Technical Solution for Mitigating Radio-Frequency (RF) Interference to Air Traffic Control Communications

Chapter 2 External Radio Interference

Secure Software Download for Software Defined Radio

AP905 RADIO FREQUENCY COUNTER AP905. Radio Frequency Counter

Parametric Filters For Non-Stationary Interference Mitigation in Airborne Radars

Application Software for Packet Radio

Riding Big Waves. Radio Telescopes on the Moon Opening the last unexplored frequency window

EMF Guidelines for Radio Frequency and Low Frequency Electromagnetic Radiation

Conquer Radio Frequency

VU University Amsterdam Faculty of Sciences

Stichting Astron

Research & Development

University of Warsaw

Faculty of Mathematics, Computer Science and Mechanics

Joint Master of Science Programme Tomasz Witaszczyk

Student no. 235854 (UW), 2002647 (VU)

Radio frequency interference mitigation for software telescopes.

Master's thesis in COMPUTER SCIENCE

Supervisors:

Dr. Rob V. van Nieuwpoort VU University Amsterdam

Dr. John W. Romein

ASTRON (Netherlands Institute for Radio Astronomy)

August 2010

Supervisor's statement

Hereby I conrm that the present thesis was prepared under my supervision and that it fulls the requirements for the degree of Master of Computer Science.

Date

Supervisor's signature

Author's statement

Hereby I declare that the present thesis was prepared by me and none of its contents was obtained by means that are against the law. I also declare that the present thesis is a part of the Joint Master of Science Programme of the University of Warsaw and the Vrije Universiteit in Amsterdam. The thesis has never before been a subject of any procedure of obtaining an academic degree. Moreover, I declare that the present version of the thesis is identical to the attached electronic version.

Date

Author's signature

Abstract Radioastronomy is a rapidly growing discipline of science. Astronomers keep building more powerful telescopes. The diculties with building bigger parabolic dishes force them to change to telescopes that consists of thousands of small antennas. Data from all antennas are later processed on a central processing unit. We call such telescopes software telescopes. LOFAR, that currently is under development, is one such telescope. Unfortunately, celestial objects are not the only sources of waves that can be received by radio telescopes. All other sources are, from the astronomers point-of-view, called Radio Frequency Interference (RFI). RFI Mitigation is a challenging problem in Radioastronomy. Several mitigation techniques have been developed over the years, but most of them operate oine on a stored data. Online RFI mitigation is dierent and more dicult than oine mitigation, as we have limited computing power and we can look only at a small part of data at one time. While some observations with LOFAR are done only using online processing, currently there are no RFI mitigation techniques included in a software solution. Moreover, in some online processing pipelines, the sampled data from dierent stations are added, so if the data from one station is bad, the sum of the samples is bad as well and the output harms the astronomical data quality. Currently, no mechanism is available that detects and avoids this behavior. This thesis addresses this problem. To check possibilities of detection of RFI and misbehaving stations, the RFI Processing Library was developed and integrated with the existing LOFAR software correlator. By comparing eciency and accuracy, the Threshold Blanking algorithm has been chosen as a recommendation for the LOFAR online software. As a tool for detecting and removing stations with corrupted data, the Pre Correlation Stations Detector algorithm has been chosen.

Keywords software telescope, RFI, mitigation, Radio Frequency Interference, interference, LOFAR, Blue Gene, signal processing, radioastronomy, library

Thesis domain (Socrates-Erasmus subject area codes) 11.3 Computer Science

Subject classication J. Computer Applications J.2 Physical Sciences and engineering J.2.2 Astronomy

Contents

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2. Radio telescopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13 13 15

. . . . . . . 2.1. Concept of LOFAR . . . . . 2.2. Overall architecture . . . . 2.3. Online Processing Software

Acknowledgments Introduction

1. Basics of radio astronomy

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

17 17 17 18

. . 3.1. Sources of RFI . . . . . . . . . 3.2. Pro-active mitigation strategies 3.3. Reactive Mitigation Strategies 3.3.1. Blanking in time . . . . 3.3.2. Blanking in frequency . 3.3.3. Flagging . . . . . . . . . 3.3.4. Summary . . . . . . . . 3.4. RFI Mitigation for LOFAR . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

21 21 22 23 23 23 23 23 24

. . . . . Main concept . . . . . . . . . . Architecture of the library . . . Using the library . . . . . . . . 4.3.1. Pre-requisites . . . . . . 4.3.2. Adapter classes . . . . . 4.3.3. Running the algorithms 4.3.4. Feedback loop . . . . . . 4.3.5. Statistics . . . . . . . . Related Libraries . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

25 25 25 27 27 27 28 31 32 33

. . . . . . . . . 5.1. Rationale . . . . . . . . . . . . . . . . . 5.2. Implemented RFI algorithms . . . . . . 5.2.1. Threshold Blanking . . . . . . . 5.2.2. Parametrized Threshold Blanking

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

35 35 35 35 35

2. LOFAR Telescope

. . . .

3. Radio Frequency Interference

4. RFI Processing Library

4.1. 4.2. 4.3.

4.4.

5. RFI Removal Algorithms

3

5.2.3. Var Threshold Blanking . . . . . . . . . . . . . . 5.2.4. Sum Threshold Blanking . . . . . . . . . . . . . . 5.2.5. Auto Threshold Blanking . . . . . . . . . . . . . 5.2.6. The APB Algorithm . . . . . . . . . . . . . . . . 5.2.7. Threshold Flagging . . . . . . . . . . . . . . . . . 5.2.8. Var Threshold Flagging . . . . . . . . . . . . . . 5.3. Adding algorithms to the RFI Processing Library . . . . 5.4. Implemented station detectors . . . . . . . . . . . . . . . 5.4.1. PreCorrelation Detector . . . . . . . . . . . . . . 5.4.2. PostCorrelation Detector . . . . . . . . . . . . . 5.5. Adding station detectors to the RFI Processing Library

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

36 37 37 38 39 39 39 40 40 40 41

. . . . . . . . . . . . . . . 6.1. Integration with LOFAR correlator software . . . . . . 6.1.1. Adapters . . . . . . . . . . . . . . . . . . . . . 6.2. Testing methodology . . . . . . . . . . . . . . . . . . . 6.3. Results for the post correlation algorithms . . . . . . . 6.4. Results for the pre correlation algorithms . . . . . . . 6.4.1. Frequency/Time row average . . . . . . . . . . 6.5. Results for the station detectors . . . . . . . . . . . . . 6.6. Processing time . . . . . . . . . . . . . . . . . . . . . . 6.7. Results of comparisons . . . . . . . . . . . . . . . . . . 6.7.1. Comparisons between precorrelation algorithms 6.7.2. Results obtained by the feedback mechanism .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

43 43 43 44 45 49 50 51 54 56 56 57

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1. Recommendation for LOFAR telescope . . . . . . . . . . . . . . . . . . . . . . 7.2. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63 63 64

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

6. Results obtained for LOFAR

. . . . . . . . . . . .

7. Conclusions

Bibliography

4

List of Figures

1.1. 1.2. 1.3. 1.4.

Electromagnetic Spectrum . . . . . . Full-size replica of Jansky's telescope Grote Reber's rst radio telescope . Very Large Array . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

13 14 14 15

2.1. LOFAR Low Band Antennas . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2. Overview of the LOFAR processing . . . . . . . . . . . . . . . . . . . . . . . . 2.3. Online processing pipelines . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17 18 20

3.1. Example of RFI in time-frequency domain . . . . . . . . . . . . . . . . . . . .

21

4.1. Architecture of the library . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

6.1. 6.2. 6.3. 6.4. 6.5.

45 46 46 47

Visualization of the LOFAR data on the model airplanes subband - reference Visualization of the Threshold Flagging results with threshold = 0.005 . . . . Visualization of the Threshold Flagging results with threshold = 0.5 . . . . . Visualization of the Threshold Flagging results with threshold = 0.05 . . . . . Visualization of the Threshold Flagging results with threshold = 0.05 (clean subband) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6. Visualization of the Var Threshold Flagging results with threshold = 0.01 and window size = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7. Visualization of the Var Threshold Flagging results with threshold = 0.001 and window size = 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.8. Visualization of the Var Threshold Flagging results with threshold = 0.0005 and window size = 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9. Percentage of marked samples by the Threshold Blanking algorithm . . . . . 6.10. Percentage of marked samples by the Parametrized Threshold Blanking algorithm 6.11. Percentage of marked samples by the Var Threshold Blanking algorithm with window size = 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.12. Percentage of marked samples by the Var Threshold Blanking algorithm with threshold = 1200 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.13. Percentage of marked samples by the Sum Threshold Blanking algorithm with window size = 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.14. Percentage of marked samples by the Sum Threshold Blanking algorithm with threshold = 2000 * window size . . . . . . . . . . . . . . . . . . . . . . . . . . 6.15. Percentage of marked samples by the Auto Threshold Blanking algorithm (average variant) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.16. Percentage of marked samples by the Auto Threshold Blanking algorithm (median variant) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.17. Percentage of marked samples in a time row . . . . . . . . . . . . . . . . . . . 5

47 48 48 49 50 51 52 53 54 55 56 57 58

6.18. Percentage of marked samples in a frequency row . . . . . . . . . . . . . . . 6.19. Processing time for one subband and 5 stations - Precorrelation Algorithms 6.20. Processing time for one subband and 5 stations - Statistics . . . . . . . . . . 6.21. Processing time for one subband and 5 stations - Postcorrelation Algorithms

6

. . . .

59 60 61 62

List of Tables

6.1. Pre Correlation StationDetector - percentage of marked stations on clear data 6.2. Pre Correlation Stations Detector - percentage of marked disturbed chunks of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3. Comparison between precorrelation algorithms - TV Subband . . . . . . . . .

7

52 53 56

Acknowledgments

I would like to thank my supervisor Dr. Rob V. van Nieuwpoort for all his help and support during this project. His advices supported by his impressive knowledge of both computer science and astronomy were invaluable, and without them I would not be able to accomplish this project. And, what is not less important, Dr Nieuwpoort has a great sense of humour and it is a pleasure to work with him. I would also like to thank Dr John W. Romein, my co-supervisor from the Netherlands Institute for Radio Astronomy (ASTRON). I am grateful for his valuable comments that denitely helped to signicantly improve the thesis. This thesis is a result of a joint VU Amsterdam and ASTRON (the Netherlands Institute for Radio Astronomy) project. I would like to thank all the people from ASTRON for giving me a chance to work with the newest radio telescope, opportunity to participate in the RFI Mitigation workshop, and all the support I have received during the project. Special thanks go to astronomers, especially Willem Baan, Andrei Oringa and Peter Fridman. They helped me to understand basics of radio astronomy and gave some precious advices that improved the nal result.

9

Introduction

Astronomy is one of the oldest sciences in human history. Although, until middle of the 20th century people were able to observe the sky only using visible light. Then, astronomers discovered that celestial objects also emit waves from other parts of the electromagnetic spectrum - among others, the radio waves. This is how the discipline called Radioastronomy began. The basics of Radioastronomy can be found in chapter 1. A few years ago radio telescopes were looking mainly as a big, parabolic dishes. As keeping building bigger dishes became extremely dicult, astronomers had to nd the other way to make the telescopes more sensitive. New telescopes use thousands of small antennas instead. The signals from them are processed together in a central processing unit, that is often a software solution. We call such a telescopes a software telescope. LOw Frequency ARray (LOFAR) is one such radio telescope, developed by ASTRON - Netherlands Institute for Radio Astronomy. Detailed description of LOFAR and its central processing software can be found in chapter 2. Unfortunately, celestial objects on the sky are not the only sources that can emit radio waves received by the sensitive telescope antennas. Satellites, aircraft, TV and radio stations, and many other sources emit waves that disturb astronomical observation. All such signals are called Radio Frequency Interference (RFI). A brief explanation of what RFI is, and a review of existing RFI mitigation techniques can be found in chapter 3. Some of the observations done using the LOFAR telescope, like pulsar detection, are entirely online, so the received data is not stored anywhere for further processing. Therefore, RFI has a great impact on the astronomical data quality. Moreover, in the beam-forming processing pipeline, the sampled data from dierent stations is added. If the data from one station is bad, either by RFI or misbehavior of the station, the sum of the samples is bad as well. Currently, no mechanism is available that detects and avoids this behavior. This thesis addresses this problem. To answer those questions, we created the generic RFI processing library. Description of the library's capabilities and a short manual can be found in chapter 4. Using the library, we can answer the question whether the online mitigation possible and ecient, and whether we need to change existing data structures in LOFAR software. By integrating the Library with the existing source code for LOFAR we can test implemented techniques basing on real observation data. As many RFI mitigation techniques have been developed, the RFI Processing Library allows us to use many of them and compare them to each other. A list of currently implemented algorithms and their description can be found in chapter 5. At the beginning of chapter 6 we can nd how the RFI Processing Library has been integrated with LOFAR software. The most important part of this chapter are results obtained for the real observation data using the library. Finally, in chapter 7 we can nd short analysis of the results and recommendations for techniques for the online RFI mitigation for the LOFAR telescope. 11

Chapter 1 Basics of radio astronomy

1.1. Overview

According to Wikipedia [1], astronomy is the scientic study of celestial objects (such as stars, planets, comets, nebulae, star clusters and galaxies). The rst observations of the night sky have been done in the ancient times, what makes astronomy one of the oldest sciences. Before inventing the telescope, people were able to analyze only objects that were visible to the naked eye. They believed that the earth is center of the universe and that everything that can be seen in the sky is rotating around it. This is known as the geocentric model. This model has been changed during the renaissance by Nicolaus Copernicus, who introduced the heliocentric model, where sun is the center of our solar system.

Figure 1.1: Electromagnetic Spectrum The invention of the optical telescope was a great improvement for astronomical observations. Scientists were able to discover many new stars and other objects. Still, the only source of information about celestial objects was the visible light, which is only a small part of a 13

electromagnetic spectrum, as can be seen on Figure 1.1. Before the early 1930s, astronomers had not known that objects can emit signals at dierent frequencies.

Figure 1.2: Full-size replica of Jansky's telescope The rst discovery of radio frequency signals from astronomical sources was done by Karl Jansky. As many science discoveries, this one was also done while looking for something completely dierent. During Jansky's work as en engineer at the Bell Telephone Laboratories, he was investigating radio frequency interference from thunderstorms. To achieve that goal, he built an antenna that was tuned to respond to radiation at a wavelength of 14.6 meters and rotated in a complete circle on old Ford tires every 20 minutes [2]. Figure 1.2 shows full-size replica of Jansky's telescope. Apart from the signal coming from thunderstorms, he discovered some signal coming from unknown source. He found out that the power of this unknown static signal changed in a complete cycle in 24 hours and it is correlated with the earth rotation. At rst, Janksy thought that source of this signal was the sun, but after further research he discovered that the Milky Way is the source and published his nding in 1933. More information about Jansky's discovery can be found in [2].

Figure 1.3: Grote Reber's rst radio telescope For the next few years after Jansky's discovery, no one paid a lot of attention to it. The rst person who picked up his nding was Grote Reber. He decided to build his own radio 14

telescope and nished it in September 1937 [1]. As can be seen in Figure 1.3, his telescope looks more like modern radio telescopes. At rst, he setup his telescope to receive signals at high frequencies, and he failed to nd any signals from outer space. Because of the rst failure he decided to modify his telescope twice, to operate at lower frequency band. In 1938 he successfully conrmed Jansky's discovery and focused on creating a radiofrequency sky map. He achieved this goal in 1941, resulting in the rst sky map based on non-visible spectrum. 1.2. Radio telescopes

Radio frequency waves from outer sources that can pass through atmosphere range from 10 MHz to 1000 GHz (1mm to 100 meters). The waves cannot be seen by human eye, but they can be noticed by sensitive radio antennas. Radio antennas used in astronomy are called radio telescopes. They can be used as a single antenna or in an array. Most of the telescopes that are currently used look like large, parabolic dishes that can be directed to any point in the sky. Dishes are used to reect the waves and gather them into central point. Example of array of such telescopes can be seen in Figure 1.4.

Figure 1.4: Very Large Array Using two or more telescopes is called arraying. All the antennas in array receive data simultaneously and all the data is combined into one signal. That gives to astronomers more detailed knowledge about celestial objects, because thanks to technique called interferometry all the telescopes can act as parts of one huge radiotelescope. More information about arraying and interferometry can be found in [4] and [5]. Antennas are usually set up for range of frequencies, also called a band. Data from antennas after gathering are changed by the analog-to-digital converters into digital form. Then, the data is processed by the modern signal processing techniques. A few years ago, most of the digital signal processing was done using dedicated solutions like FPGA's (Field Programmable Gate Arrays). This solution has several disadvantages:

• It is hard to modify. 15

• It is hard to implement. • In case of eciency problems, adding more devices is very expensive. That is why currently in many observatories data is processed using software solutions those are so called software telescopes. Compared to hardware solutions, they are much easier to modify and they can be parametrized in many ways. By adding more processing units we can simply solve eciency problems without changing anything in software.

16

Chapter 2 LOFAR Telescope

2.1. Concept of LOFAR

LOFAR stands for LOw Frequency ARray and is an array radio telescope that operates in low frequency band (10 - 250 MHz). The main dierence from most of modern radio telescopes is instead of big and expensive dishes, it consists of thousands of small antennas. Signals from all the antennas are combined in the software running on an IBM BlueGene/P supercomputer. This chapter describes concepts behind the LOFAR telescope and architecture of the software solution. 2.2. Overall architecture

Figure 2.1: LOFAR Low Band Antennas LOFAR currently consists of a 2-kilometer wide compact core area of 20 stations, 16 remote stations with a maximum distance of 125 km, and 8 international stations, with a maximum distance of 1300 km [6]. In the near future LOFAR will have approximately 64 stations. Each station consist of two type on antennas: 48 to 96 Low Band Antennas (LBA) and 48 to 96 High Band Antennas (HBA), which gives a total number of many thousands of antennas. 17

All antennas are dual polarization. Low-Band Antennas (see gure 2.1) operate in the 10-80 MHz band and High-Band Antennas operate in the 110-250 MHz range. It is pointless to observe in 80-110 MHz frequency range, because of the FM Radio transmissions. Antennas are grouped in stations to create a hierarchical structure and to decrease amount of transfered data. Combining all the data centrally would be too computationally expensive, so signals are combined locally, within the station, using FPGAs. At the station, the analogto-digital conversion is being done, and then signals from all the receivers inside the station are pre-processed. Data from the stations are transmitted to the central processing unit by the Wide-Area Network - dedicated light paths were created to achieve that. UDP is used as a transport protocol. It is an unreliable protocol, but losses of data can be tolerated, and using TCP would be too expensive and too hard to implement on FPGAs. Transfered data consists of samples - complex numbers that represent amplitude and phase of received signal encoded by complex integers. After receiving on the BlueGene/P supercomputer data is ltered and splited in smaller frequency ranges. Many pipelines are working in parallel, responsible for beam-forming, correlating etc, which shows the exibility of the software processing solution. Online processing will be described in more details in section 2.3. Processed and correlated data is stored on the storage cluster for further processing, that can be done o-line.

Figure 2.2: Overview of the LOFAR processing Overview of the data processing on LOFAR can be seen on the gure 2.2. More detailed description of an architecture of LOFAR can be found in [6]. 2.3. Online Processing Software

All the online processing of the LOFAR data is done on the BlueGene/P supercomputer. This supercomputer consists of two types of nodes: Input/Output nodes (called I/O nodes) and compute nodes. For each type, a dedicated application was created: IONProc and CNProc, respectively. The main tasks of the IONProc application are to receive the station UDP data, to buer the data for up to 2.5 seconds, and to forward it to the compute nodes [6]. After the data is processed on the compute nodes, the I/O nodes receive the data and send it to the storage nodes. 18

I/O nodes chop the data streams that come from the stations into chunks of one frequency subband and approximately one second of time. Such a chunk is the unit of data that is sent to the compute node for further processing [6]. As processing a chunk takes longer than one second, we need more than one compute nodes, and chunks need to be distributed over them. Scheduling is done using a round-robin algorithm: compute node receives the chunk, processes it, sends the results back and waits in the queue for the next chunk. Before a computing node can perform real computations, a data exchange has to be performed. I/O nodes receives all frequency subbands from one station, while computing nodes require one subband from all stations. After the data exchange is nished, each computing node can perform signal processing. As we can see in the gure 2.2, there are several processing pipelines, among others:

• Pulsar detection pipeline. • Epoch of Re-ionization pipeline. • Imaging pipeline. • Transient sources pipeline. The results from some of the pipelines are stored on the external storage system, but the common part of all the pipelines is online processing. Online processing consists of several steps. All steps can be seen on the gure 2.3. Each pipeline consists of only a part of those steps. For example, during pulsar detection, correlation step is not used. A short description of all the steps can be found below. The rst step is the data conversion. Samples that come from the I/O nodes are 4-bit, 8-bit or 16-bit integer samples. As the BlueGene is much better in handling oating-point operations, they are converted into 32-bit big-endian oating point numbers. Conversion is done after the data exchange, to decrease the size of data sent between nodes. Next, the converted data are processed by a Poly-Phase Filter bank (PPF). The main task of the PPF is to split a frequency subband into a number of narrower frequency channels. After this step, we have higher frequency resolution, but to avoid increase of the data size, the time resolution is lower. The PPF consists of two parts:

• Finite Impulse Response (FIR) lter, that multiplies a sample with a real weight factor generated on the y. • Fast Fourier Transformation (FFT), to transform the original function in the time domain to a function in the frequency domain. LOFAR stations are placed at many dierent geographical locations, so radio waves from the celestial sources arrive at dierent times. Therefore, all signals have to be shifted before further processing, what is done during phase shift correction step. The bandpass correction step compensates for an artifact introduced by a lter bank that runs on the FPGAs in the stations [6]. During this step, each sample is once again multiplied by a real, channel-dependent value. It cannot be done during preprocessing on the station, as it can be seen only on the data processed by the PPF. Superstation beam forming adds the samples from a group of stations that are geographically close, so that they form virtual superstation with extended sensitivity. 19

Figure 2.3: Online processing pipelines The most computationally expensive operation is correlation. During this step samples from single stations (or virtual superstitions) are correlated. As the signal from the celestial sources are very weak and single antennas receive mainly noise, it is essential to nd the statistical coherence. Samples of each pair of stations are correlated, by multiplying the sample of one station with the complex conjugate of the sample of the other station [6]. More detailed description of each step and all the correlator software can be found in [6]. Explanation of standard signal processing methods can be found in [7].

20

Chapter 3 Radio Frequency Interference

Unfortunately for radio astronomy, celestial objects are not the only sources of radio waves than can be observed on earth. During years of technological progress people developed devices that can emit radio waves, such as TV antennas. In addition, human activity, such as ying airplanes, cause reection of existing waves. All those articial waves are called Radio Frequency Interference (RFI). Waves emitted and reected by humans interfere with those emitted by celestial objects, making observing much harder. In this chapter, the main sources of RFI are described, as well as the main strategies used to solve this problem. Example of RFI in time-frequency domain can be seen on gure 3.1.

Figure 3.1: Example of RFI in time-frequency domain

3.1. Sources of RFI

According to [9], we can determine four main categories of RFI sources:

• Satellites Satellites are a serious problem for radio observations. Some of them have signals strong enough to even destroy sensitive telescope antennas, so during the observation we have 21

to be very careful to avoid receiving signals from them. Luckily, their position in orbit can be easily determined, so we can schedule observations properly. However, from some sources, such as GPS satellites, signals can always be received.

• Aircraft Transmissions from ying airplanes are very short-term, so during long-term averaged observations they can be ignored. However, it can aect some observations and cannot be easily predicted.

• Ground-based There are plenty of ground-based RFI sources, such as TV and FM antennas, cell phone emitting towers etc. Apart from the devices that have emitting waves as a main goal, any electronic installation can aect observations if it is close enough to the observatory. Astronomers try to build observatories as far from all such sources as they can, but it is not possible to avoid all ground-based RFI.

• Observatory-based Modern observatories contain a lot of high-end electronic and computing installations, so they are sources of some RFI themselves. 3.2. Pro-active mitigation strategies

By pro-active mitigation strategies we mean all strategies that aim to avoid all RFI in the rst place. Those are the best strategies, because by having clean electromagnetic spectrum, RFI is no longer a problem. Unfortunately, some RFI, like GPS signals, cannot be avoided. Examples of such strategies are:

• Regulation Many dierent societies want to use some part of electromagnetic spectrum for their own purposes. They need it for all sorts of wireless communication, data broadcasting, positioning systems etc. However, because of the requests from radioastronomers, some spectral bands are reserved for needs of astronomy, so observations in those bands can be done without impact of RFI.

• Radio Quiet Zones In cooperation with local authorities, astronomical observatories are often placed in so called `radio quiet zones`. Observatories are protected from some ground-based RFI. In a radius of a few kilometers from the antennas, the use of all radio wave emitters is prohibited. Sometimes, radio quiet zones forbid only using specic frequencies, as observations are done only using this specic part of spectrum. Still, this strategy does not help with some types of RFI, such as satellites.

• The Observatory Environment Signicant part of the radio interference in observations comes from the observatories themselves, because high-end electronic devices emit electromagnetic waves. Solutions include extensive RFI shielding around the emitting devices, screened rooms, RFI - tight cabinets. Everything has to be monitored full-time, to avoid any leaks. 22

3.3. Reactive Mitigation Strategies

By reactive mitigation strategies we mean all strategies, where we detect RFI in data stream received from antennas. Then, we remove marked sampled from the data stream or adjust the level of the interfered data.

3.3.1. Blanking in time Blanking in time is the most popular reactive strategy. It can be done on analog, preprocessed data as well as on digitalized samples. Main idea is that observer sets a threshold level, that is used to distinguish RFI from the RFI - free data. In the simplest variant we iterate over all the data in time order, and samples with values above the threshold are marked. We can also implement more complex solutions, that make decisions based on mean values (past and future), standard deviations etc. Blanking in time has many advantages, it is:

• simple to understand, • easy to implement, • fast (has low complexity), • simple to automatize, • quite eective. As astronomy pipelines already are compute intensive, low complexity is the most important fact that makes them choose this strategy. Example of implemented blanking algorithms can be found in [10].

3.3.2. Blanking in frequency Because of the fact that modern software telescopes are real-time systems, sometimes it is impossible to implement sophisticated blanking in time strategies. For example, we cannot make a decision based on values of future samples. Instead, it sometimes is a better strategy to iterate the data in frequency order, while identifying frequencies that have RFI.

3.3.3. Flagging Flagging is very similar to blanking with one big dierence - agging operates on data after correlation. Flagging is currently being done oine on LOFAR telescope. Details can be found in [8].

3.3.4. Summary Some additional, less popular techniques have been developed during the years. Examples are:

• Null Steering. • Adaptive Filters. Adaptive lters can be used when a copy of the RFI is available. For example, we can record the data from nearby electronic emitters. Then we can match the data received from telescope antennas with recorded lter and remove the interference. More details can be found in [11]. 23

• Mitigation in array imaging stage. • Spatial ltering. Choosing most adequate strategy depends on many factors and there is no single universal strategy that will be suitable for all cases. 3.4. RFI Mitigation for LOFAR

Currently there is no online RFI mitigation for the LOFAR telescope. This problem aects mainly the science pipelines that are computed only online - like the pulsar detection pipeline. Still, even for the other pipelines, the data on the storage nodes are saved without an impact of any RFI mitigation. A solution for oine RFI mitigation for LOFAR data has been created (Oringa et. al. [8]). It is software that allows to detect and remove RFI in radio measurement sets - a standard le type for storing the radio data, also used in LOFAR. The algorithms implemented in Oringa's RFI software are:

• Threshold Flagging • Var Threshold Flagging • Sum Threshold Flagging • CUSUM method • Surface tting and smoothing • Singular Value Decomposition First three of them are described in the section 5.2. Rest of them, are described in [8]. It is very hard to use Surface tting and smoothing or Singular Value Decomposition for online mitigation purposes, as they are more computationally expensive - only 4 rst algorithms have linear complexity [8].

24

Chapter 4 RFI Processing Library

4.1. Main concept

As it is explained in the previous section, there are many techniques to mitigate radio frequency interference. Choosing the most suitable technique depends on many dierent factors for each telescope, as signal-to-noise ratio, characteristic of RFI and others. To make the process of making a decision easier, we have decided to create a generic RFI processing library that can be easily integrated with any existing software telescope. The main goals for the library are:

• It has to be written in a common programming language used to program software telescopes. • It has to be data format independent. • Adding RFI detection algorithms or misbehaving stations detection algorithms to the library has to be relatively simple. • It has to enable the comparisons of algorithms. Because of the fact that C++ language is currently most commonly chosen for software telescopes (i.e. LOFAR, VLA) we have decided to choose this language for implementation. To provide independence of data format used in a telescope, the library is based on metaprogramming concepts. In order to use it, an adapter class for the data has to be implemented. This class is responsible to create data iterators, on which algorithms operate. To add an algorithm to the library, we simply write a class that inherits from the base class for the algorithms with a re-written detect() method. When we have algorithms that we want to test, we can add as many of them as we want to the main class of the library and run. In result we obtain many useful statistics, such as the percentage of marked samples (also per time / frequency row) and relative comparisons of each pair of algorithms. 4.2. Architecture of the library

The RFI Processing Library is a template-based library. Most of the classes are templates with an adapter class, a data iterator and a sample type as a template parameters. More details about the concept, that has to be fullled by template parameters can be found in section 4.3.2. Thanks to the fact, that design of the library is generic, it can be easily integrated with 25

Figure 4.1: Architecture of the library any existing systems, no matter what are the types of samples and how are they kept in the memory. The main class in the library is the InterferenceMitigator. It is responsible for scheduling detection process of all the algorithms and keeping the results. This class can also output the results into given output stream. InterferenceMitigator owns a collections of both Pre- and PostCorrelationAlgorithms. Those algorithms are used during detection process. Results for each algorithm are stored in the instances of Result class, while results of comparison between them can be found in the instances of ResultsCompared class. One instance represents results from one algorithm/pair. PreCorrelationAlgorithm and PostCorrelationAlgorithm are abstract template classes for algorithms. All algorithms have to inherit from one of those classes and override method detect(). Marking samples has to be done using mark() method. CorrelatedDataAdapter and UncorrelatedDataAdapter are base classes that gives an interface for adapters. Data adapters written by the user can, but do not have to, inherit from those classes. StationDetector is the abstract template class for algorithms that detect misbehaving 26

stations - both using data before and after correlation. All detection algorithms have to inherit from this class (with either correlated or uncorrelated data adapter as a template parameter) and override method detect() . 4.3. Using the library

4.3.1. Pre-requisites In order to use the library some pre-requisites has to be fullled:

• Telescope software has to be written in C++ or in another language that can be linked with C++. • The signals from the telescope has to be transformed by the Fourier transformation and represented as a complex numbers. • The sample type has to be comparable with double. • The Boost library has to be installed on the system [15].

4.3.2. Adapter classes If the pre-requisites are fullled, we have to create two adapter classes:

• Adapter of precorrelation data. • Adapter of postcorrelation data. Adapter classes are a standard use of the Adapter design pattern. The main responsibility of those classes is to adapt data, precorrelated or correlated respectively, to a format that can be understood by the library. To achieve this goal, each of those classes has to implement appropriate concept,

• for uncorrelated data:

class UncorrelatedDataAdapter { public: unsigned unsigned unsigned unsigned

int int int int

getNrOfChannels() const getNrOfPolarizations() const getNrOfStations() const getNrOfSamples() const

void getIteratorsFrequency(IteratorType &begin, IteratorType &end, int stationNr, int polarizationNr, int sampleNr) void getIteratorsTime(IteratorType &begin, IteratorType &end, int stationNr, int polarizationNr, int channelNr) void getIteratorsStations(IteratorType &begin, IteratorType &end, int polarizationNr, int channelNr, int sampleNr) }; • for correlated data: 27

class CorrelatedDataAdapter { public: unsigned int getNrOfChannels() const unsigned int getNrOfPolarizations() const unsigned int getNrOfStations() const unsigned int getNrOfSamples() const void getIteratorsFrequency(IteratorType &begin, IteratorType &end, int stationNr1, int stationNr2, int polarizationNr1, int polarizationNr2, int sampleNr) void getIteratorsTime(IteratorType &begin, IteratorType &end, int stationNr1, int stationNr2, int polarizationNr1, int polarizationNr2, int channelNr) void getIteratorsBaselines(IteratorType &begin, IteratorType &end, int polarizationNr1, int polarizationNr2, int channelNr, int sampleNr) }; Methods getNrOfChannels(), getNrOfPolarizations(), getNrOfSamples(), getNrOfStations() simply return information about size of the data. The most important methods in both cases are getIteratorsTime() and getIteratorsFrequency(). For each of the methods, rst two parameters (IteratorType &begin, IteratorType &end) are output parameters. The method is responsible for storing iterators pointing to the beginning and the end of data. All the after parameters are input parameters. They are used to specify over what part of the data we want to iterate. The only dierence between the methods for correlated and uncorrelated data is that we have to specify information for one station in the uncorrelated case, and for two in correlated case. Methods getIteratorsStations() and getIteratorsBaselines() have to be implemented only if we want to use the feature, that allows us to detect and remove all the data from particular station. In addition, in adapters we can dene two public types:

• IteratorType • SampleType The advantage of doing so, is that those are default template parameters for mitigator objects and algorithms, and we do not have to use them explicitly, so using the library is much easier from programmer's point of view.

4.3.3. Running the algorithms When we have both adapters implemented we can run the mitigation software. The rst step is to create the main mitigation object - an instance of the class InterferenceMitigator. The most important elements of this class can be seen below: 28

template< typename UncorrelatedDataAdapter, typename CorrelatedDataAdapter, typename UncorrelatedDataIterator = typename UncorrelatedDataAdapter::IteratorType, typename UncorrelatedSampleType = typename UncorrelatedDataAdapter::SampleType, typename CorrelatedDataIterator = typename CorrelatedDataAdapter::IteratorType, typename CorrelatedSampleType = typename CorrelatedDataAdapter::SampleType> class InterferenceMitigator { public: typedef PreCorrelationAlgorithm UncorrelatedAlgorithm; typedef PostCorrelationAlgorithm CorrelatedAlgorithm; typedef StationDetector UncorrelatedDetector; typedef StationDetector CorrelatedDetector; InterferenceMitigator(); /* RFI detection */ void addAlgorithm(CorrelatedAlgorithm &algorithm); void addAlgorithm(UncorrelatedAlgorithm &algorithm); void runUncorrelated(UncorrelatedDataAdapter & uncorrelatedAdapter); void runCorrelated(CorrelatedDataAdapter &correlatedAdapter); /* Bad stations detection */ void addCorrDetector(CorrelatedDetector &detector); void addUncorrDetector(UncorrelatedDetector &detector); void runStationsCorr(CorrelatedDataAdapter &adapter); void runStationsUncorr(UncorrelatedDataAdapter &adapter); /* Results */ void outputToFile(std::string & file); void outputToFile(const char * file); void output(); };

29

As we can see in the listing, the InterferenceMitigator class is a template with six parameters. To create an instance of this class, we have to dene types of adapter classes, iterators over both correlated and uncorrelated data and sample types. However, if we have dened IteratorType and SampleType inside adapter classes as specied in section 4.3.2, we have to pass only two parameters, so the typical use of this class looks like InterferenceMitigator. After the instance of main object has been created, we can add algorithms. We have two types of algorithms:

• Algorithms working on uncorrelated data. • Algorithms working on correlated data. For each of them there is an abstract template class called, respectively PreCorrelationAlgorithm and PostCorrelationAlgorithm. Signatures of those classes looks like below:

• PreCorrelationAlgorithm

template class PreCorrelationAlgorithm {}; • PostCorrelationAlgorithm

template class PostCorrelationAlgorithm {}; As we can see above, the main template parameter in both cases is DataAdapter, and once again, if we have public types dened in the adapter, it is the only template parameter. PostCorrelationAlgorithm and PreCorrelationAlgorithm are pure abstract classes, so we cannot create instances of them. All the algorithms in the library inherit from one of those classes. To use an algorithm, we rst have to create an instance of one of such inherited class. The list of algorithms currently implemented in the library can be found in chapter 5. Once created, we can add algorithms to the main object using the overloaded method addAlgorithm(). There is no limit for the number of tested algorithms, but the complexity of comparing them is O(n2 k), where n is the number of algorithms and k is the number of samples. This comes from the fact that we have to compare each pair of them and there are n(n − 1)/2 pairs. Algorithms can be parametrized - for example, a simple threshold algorithm gets the value of a threshold. Parameters are passed during construction of an object. If we want to compare the eectiveness of one algorithm with two dierent parameters, this is not a problem - in one mitigator object we can add many instances of one class. After setting the collection of algorithms, we can run the detection process. It is done simply by calling either the runCorrelated() or runUncorrelated() method of the mitigator object with instances of data adapters as a parameters. All detecting and comparing processes are included in those methods. If we want to detect misbehaving stations, we can add detector objects to the mitigator. Just like in the RFI algorithms case, we have two types of detectors: 30

• Detectors working on uncorrelated data. • Detectors working on correlated data. Each detector has to inherit from the StationDetector class, independently of used data type. Signature of this class looks like below:

template class StationDetector {}; Once again, the main template parameter is DataAdapter. By checking type of the Adapter, we can determine if the particular detection algorithm works on the data after or before the correlation process. The list of detectors currently implemented in the library can be found in chapter 5. The results and statistics about all the detection processes are stored inside the mitigator object. In order to see them we should use the output() method. By default, results are shown on the std::cerr stream. If we want to redirect it to le we should rst use the outputToFile() method with the name of the le as a parameter. A simple, complete example of using RFI Processing Library can be found below:

SampleData data; SampleDataAdapter adapter(data); PreCorrelationThresholdAlgorithm preThreshold(20); PostCorrelationThresholdAlgorithm postThreshold(20); CorrDetector postDetector(2, 0.5, 3); UncorrDetector preDetector(2, 0.5); InterferenceMitigator mitigator; mitigator.addAlgorithm(preThreshold); mitigator.addAlgorithm(postThreshold); mitigator.addCorrDetector(postDetector); mitigator.addUncorrDetector(preDetector); mitigator.runUncorrelated(adapter); mitigator.runCorrelated(adapter); mitigator.runStationsCorr(adapter); mitigator.runStationsUncorr(adapter); mitigator.output();

4.3.4. Feedback loop While some processing pipelines do not use correlated data at all, it is easier to detect some RFI (for example RFI located close to only one of the stations) process on this data. To make 31

this possible we have created a mechanism called feedback loop in the library. Feedback loop is a mechanism that analyzes the results of the detection process performed on correlated data to mark uncorrelated samples, so the uncorrelated sampled data is aected by post correlation RFI detection. Then, by keeping results of this process in the memory, those samples can be used in online processing pipelines (e.g. beam forming mode), which use only uncorrelated data. The main idea is to check if all the samples from particular moment/channel was marked by post correlation algorithm for all the baselines including given station. If this condition is fullled, we mark sample for given station/moment/channel in the uncorrelated data. The algorithm of the feedback mechanism looks as follows:

for each polarization for each sample in time for each frequency channel for each station if sample is marked for all baselines including this station if data is integrated over time mark all integrated samples for this channel/station/polarization else mark particular sample for this channel/station/polarization As we can see above, feedback loop can work with data that is both integrated and not integrated in time domain. The second advantage of having feedback is that comparisons between correlated and uncorrelated data are possible, because we can compare the results of the RFI detection on uncorrelated data with results obtained by the feedback loop on samples before correlation based on correlated data.

4.3.5. Statistics As a result of the detection process we get the following set of statistics for each algorithm:

• The number of processed samples. • The number of marked samples. • The percentage of marked samples. • The percentage of marked samples in a time row, in which any sample was marked. • The percentage of marked samples in a frequency row, in which any sample was marked. • The number of marked samples per station. As a result of comparison process we get following set of statistics for each algorithms pair:

• Percentage of samples marked by rst algorithm, that were marked by second algorithm. • Percentage of samples marked by second algorithm, that were marked by rst algorithm. For each stations detection algorithm, as a result of running the algorithm we get for each station the number of seconds where the station was recognized as misbehaving one. Below we can see the sample output for two uncorrelated RFI algorithms and one station detector: 32

Uncorrelated results: 0) Samples: 118272000, Marked: 0.268176, FreqRovAvg: 2.2243, TimeRowAvg: 1.52696 Results per station: 8224 1506 25242 19 282186 1) Samples: 118272000, Marked: 0.704076, FreqRovAvg: 4.98852, TimeRowAvg: 3.03029 Results per station: 14962 5095 40935 694 771039 0, 1) First_To_Second ratio: 1, Second_To_First ratio: 0.388596 Uncorrelated Results: 0) 0 0 5 0 40

4.4. Related Libraries

As software correlators are relatively new, very few similar libraries have been implemented. One of the similar products is Oringa's [8] project already described in section 3.4. The dierence between it and the RFI Processing Library we describe here is that it can work only oine on stored data after correlation, while the RFI Processing Library can be integrated into any online pipeline and detect RFI on data both before and after correlation. A second approach to RFI mitigation in software solutions has been used in the DifX Correlator. It has been used as an online RFI algorithms testbed. Still, while the RFI Processing Library can be integrated with any software correlator, solution created in DifX Correlator is suitable to work only within the rest of the correlator software. Currently, they use only Spectral Kurtosis approach to detect RFI. Description of use of the DifX Correlator as a testbed for RFI algorithms can be found in [12].

33

Chapter 5 RFI Removal Algorithms

5.1. Rationale

The RFI Processing Library was created to see if there is a possibility to implement online RFI mitigation on LOFAR Telescope. As correlating samples is already very computationally expensive and it is a real time process, algorithms used for the RFI mitigation process have to be extremely fast and eective. For LOFAR purposes, we can aord only a few oating point operations per sample. Because of that fact, the algorithms chosen for implementation are relatively simple. 5.2. Implemented RFI algorithms

5.2.1. Threshold Blanking The simplest algorithm in the library. The main idea is to mark samples that have the real part above the given level. The only parameter for this algorithm is value of threshold. A description of the algorithm in pseudo code can be found below:

for each station for each polarization for each moment in time domain get iterator over frequency for each iterated sample if real part of sample exceeds threshold mark sample The algorithm is very easy, but very accurate with handling short bursts of RFI. The pseudo code above use iterators over the frequency domain, but this algorithm can be applied to both the time and the frequency domain, and the results are the same.

5.2.2. Parametrized Threshold Blanking Parametrized threshold blanking is a modication of simple threshold algorithm described above. The dierence is that, if we nd sample with value exceeding threshold, we mark not only that particular sample, but also a rectangular area around it. The parameters for the algorithm are:

• The value of the threshold. 35

• The size of the rectangle in frequency domain. • The size of the rectangle in time domain. A description of the algorithm in pseudo code can be found below:

for each station for each polarization for each moment in time domain get iterator over frequency for each iterated sample if real part of sample exceeds threshold mark all the samples from the given rectangle around that sample that have not been marked yet The advantage of this approach is that if we have really strong burst usually the sample that exceeds the threshold is not the only one aected by the source of RFI, so we also mark the closest neighborhood. Just like in the rst algorithm, it can be used in both time and frequency domain.

5.2.3. Var Threshold Blanking Var Threshold Blanking is another modication of the threshold algorithm. To mark a sample as invalid, a given number of samples in a row in the frequency domain have to exceed the threshold value. The main reason to use this algorithm is to nd RFI, which is not as strong as single bursts, but is spread across multiple frequencies. Because of that, it can be used only in the frequency domain. Parameters of Var Threshold Blanking are:

• The value of the threshold. • The number of samples in a row that have to exceed the threshold level (ag border). A description of the algorithm in pseudo code can be found below:

for each station for each polarization for each moment in time domain get iterator over frequency count = 0 for each iterated sample if real part of sample exceeds threshold increment count if count > flag border mark processed sample else if count == flag border to_mark = count while to_mark >= 0 mark sample, which is to_mark positions before processed sample decrement to_mark else count = 0

36

This is simplied version of the algorithm described in [8], adapted to the data before correlation.

5.2.4. Sum Threshold Blanking Sum threshold blanking is a variant of the previous algorithm. The only dierence is that, instead of testing if each of the samples in a row exceeds some value, we test if the sum of given number of samples in a row fulll this condition. The parameters are:

• Value of the threshold for the sum. • Number of summed samples. A description of the algorithm in pseudo code can be found below:

for each station for each polarization for each moment in time domain get iterator over frequency sum = 0 clear samples Queue for each iterated sample sum += real part of sample push real part of sample into samples Queue if size of samples Queue > number of summed samples sum -= value taken from the end of queue if sum exceeds threshold if in last iteration sample was marked mark this sample else mark last N samples (including processed one), where N = number of summed samples This is simplied version of the algorithm described in [8], adapted to the data before correlation.

5.2.5. Auto Threshold Blanking The rst algorithm in the library with a dynamically determined threshold. For each chunk of data, it calculates mean and standard deviation of real values of samples, and set the threshold level to mean + aggressiveness * standard deviation, where aggressiveness is a parameter of the algorithm. A description of the algorithm in pseudo code can be found below:

for each station for each polarization for each moment in time domain get iterator over frequency calculate mean of real parts of samples calculate standard deviation of real parts of samples set threshold level to mean + aggressiveness * standard deviation for each iterated sample 37

if real part of sample exceeds threshold mark sample Second variant of this algorithm have been implemented as well - the median variant. Only dierence is that this variant sets the threshold level to median + aggressiveness * standard deviation.

5.2.6. The APB Algorithm The APB Algorithm is a threshold algorithm with dynamic calculation of a threshold level. The main idea is, like in the Auto Threshold Blanking algorithm, to mark samples that exceed level µ + aggrσ , where µ is mean of samples, σ is standard deviation, and aggr describes how oensive algorithm should be. APB Algorithm diers from Auto Threshold Blanking in a two main ways:

• It works on norms of samples, instead of their real parts. • Statistics are estimated. The Algorithm has the following parameters:

• aggr - aggressiveness of the algorithm. • F IF O - length of the queue used to calculate mean and standard deviation • n − blank - number of samples that have to be marked around the sample exceeding the threshold (has to be lower than FIFO length) • step - used to increase eciency, checking one in step samples. A description of the algorithm in pseudo code can be found below:

for each station for each polarization for each moment in time domain get iterator over frequency domain detectInLine() where detectInLine() method looks as follows:

calculate initial mean of norms for first FIFO samples calculate initial standard deviation of norms for first FIFO samples iterate over samples with given step estimate new mean of norms estimate new standard deviation of norms if norm of given sample exceeds mean + aggr * standard deviation mark (n-blank / 2) samples before and after this sample As this algorithm ags independent samples, theoretically it can be used in both the time and the frequency domain. A Detailed description of the algorithm can be found in [14]. 38

5.2.7. Threshold Flagging Threshold agging, in opposite to the algorithms mentioned in previous sections works on data after correlation. The main idea is the same as in the algorithm described in section 5.2.1 - we are marking samples with values that are above the given level. Just like in the blanking version, the only parameter for this algorithm is threshold level. Description of the algorithm in pseudo code can be found below:

for each baseline (pair of stations) for each polarization for each moment in time domain get iterator over frequency for each iterated sample if real part of sample exceeds threshold mark sample

5.2.8. Var Threshold Flagging Var threshold agging is adaptation of algorithm mentioned in section 5.2.3 to work with data after correlation. It has the same parameters and the only dierence is that it is working on baselines (pairs of stations) instead of single stations. A description of the algorithm in pseudo code can be found below:

for each baseline (pair of stations) for each polarization for each moment in time domain get iterator over frequency count = 0 for each iterated sample if real part of sample exceeds threshold increment count if count > flag border mark processed sample else if count == flag border to_mark = count while to_mark >= 0 mark sample, which is to_mark positions before processed sample decrement to_mark else count = 0

5.3. Adding algorithms to the RFI Processing Library

To add a new algorithm to the library, we have to create a class that inherits from one of two classes:

• PreCorrelationAlgorithm - for algorithms working on data before correlation. • PostCorrelationAlgorithm - for algorithms working on data after correlation. 39

The algorithm has to override the virtual abstract method detect(). This method is called by the InterferenceMitigator object, its goal is to detect invalid samples. In the detect() method we should use either getIteratorsFrequency() or getIteratorsTime() to obtain iterators over data. Choosing one of those depends on if we want to iterate over the time or over the frequency domain. If the algorithm decides that a particular sample is invalid, it has to call mark() method from base class. This method is responsible for changing the internal structures used for calculating statistics and comparing algorithms between each other. In the base classes, the getMedian(), getAverage() and getStdDev() methods are dened. They can be used for example, for determining the threshold. 5.4. Implemented station detectors

5.4.1. PreCorrelation Detector PreCorrelation Detector detects corrupted stations basing on the samples before correlation. By calculating mean and standard deviation of real values of samples across all the stations, we determine a window at which real values should t. If value from some station does not t into this window, we increment counter of marked samples for this particular station. If counter exceeds given level, the station is marked as corrupted. The Detector has the following parameters:

• aggr - aggressiveness of the algorithm. • part ∈ (0, 1) - part of all the samples that have to be disturbed to mark the station. The algorithm looks as follows:

for each time chunk of data stationsCounter = 0 for each sample, channel, polarization calculate mean of real parts of samples calculate standard deviation of real parts of samples low = mean - aggr * standard deviation high = mean + aggr * standard deviation for each station if real part of sample is not in (low, high) increment stationsCounter[station] if stationsCounter[station] exceeds (the number of all samples per station * part) mark station as corrupted on this chunk of data

5.4.2. PostCorrelation Detector The PostCorrelation Detector is very similar to The PreCorrelation Detector - the main dierence is that it works on the data after correlation. Because of that, we aggregate the data over baselines (pairs of stations) instead of stations. The Detector has the following parameters:

• aggr > 0 - aggressiveness of the algorithm. • baselines - number of baselines including particular station that have to be disturbed to mark the station. 40

• part ∈ (0, 1) - part of all the samples that have to be disturbed to mark the station. The algorithm looks as follows:

for each time chunk of data stationsCounter = 0 for each sample, channel, polarization stationsCounterLocal = 0 calculate mean of real part of samples calculate standard deviation of real part of samples low = mean - aggr * standard deviation high = mean + aggr * standard deviation for each baseline(station1, station2) if real part of sample is not in (low, high) increment stationsCounterLocal[station1] increment stationsCounterLocal[station2] for each station if stationsCounterLocal[station] exceeds baselines increment stationsCounter[station] if stationsCounter[station] exceeds (the number of samples per baseline * gamma) mark station as corrupted on this chunk of data 5.5. Adding station detectors to the RFI Processing Library

To add a new station detector to the library, we have to create a class that inherits from the StationDetector template class. The detector has to override the virtual abstract method detect(). This method is called by the InterferenceMitigator object, its goal is to detect invalid stations. Methods getIteratorsBaselines() and getIteratorsStations() can be used to iterate over samples in the stations domain. The detect() method is called once for each chunk of data. Result of the detection process is a vector of boolean values that for each station state if it is corrupted or not.

41

Chapter 6 Results obtained for LOFAR

6.1. Integration with LOFAR correlator software

To test the algorithms, a dedicated version of the processing software has been created. Instead of using the live data streams from stations, it takes the data from existing stored observations. The tests can be run oine, outside the BlueGene/P computer. The great advantage of doing so, is that tests are repeatable on the same data, allowing us to compare the eectiveness of the algorithms.

6.1.1. Adapters To make data understandable by the library, two classes have been created - adapter classes for data both before and after correlation. The class that contains precorrelated data after the FFT transformation is called FilteredData. Samples are stored in a four dimensional multi array from the boost library, containing data represented by complex oat numbers. Dimensions are:

• The number of samples (time domain). • The number of channels (frequency domain). • The polarization of sample (X or Y). • The number of stations. The FilteredDataAdapter class has been created to adapt the data. To provide required functionality, a one-dimensional view of an array is created, and iterators to the beginning and end of this view are returned. The class also denes two types of data:

• SampleType - as oat. • IteratorType - as MultiDimArray::iterator - standard Boost iterator over one-dimensional array (details can be found in [15]). The class that contains correlated data is called CorrelatedData. Samples are stored in a four dimensional multi array from the Boost library, containing data represented by complex oat numbers. Dimensions are:

• The number of channels (frequency domain). • The number of baselines (two stations combined). 43

• The polarization of sample from the rst station (X or Y). • The polarization of sample from the second station (X or Y). Note that there is no time domain, because samples over whole second are integrated into one, by complex add (for details, see [6]). To adapt the data, the CorrelatedDataAdapter class has been created. This class returns an iterator over a one-dimensional view as well. The types dened by this class are exactly the same as in the FilteredDataAdapter case. The InterferenceMitigator with a FilteredDataAdapter and a CorrelatedDataAdapter as template parameters is created in the main processing class of the real time processing software. The detection process in Filtered Data is performed just after the FFT transformation, while in CorrelatedData it is done after the correlation process. According to the Figure 2.3, on the LOFAR processing pipeline RFI mitigation on the uncorrelated data is done after the PPF (6), while on the correlated data just after the correlation (16). The feedback mechanism aects data after the superstation beam forming (9). The tests have been performed on an idle 6.2. Testing methodology

The tests have been performed on an idle server with following conguration:

• Intel(R) Core(TM) i7 CPU 920 2.67GHz - 8 cores • 6 GB RAM • Linux jupiter 2.6.31-20-generic #58-Ubuntu SMP Fri Mar 12 04:38:19 UTC 2010 x86_64 GNU/Linux Real observation les were used containing raw station output data of an observation with 5 stations and 5 subbands. 5 MPI processes working in parallel were used. The observation length was about two hours, and was done on Wed Apr 28 2010 19:25:03 GMT+0200. Subbands and stations were chosen to present dierent types of RFI. The list of subbands:

• 138 (27MC radio wavelength) • 183 (model airplanes, the stations are close to a model airplanes aireld) • 256 (clean) • 282 (TV) • 283 (TV) The stations used (CS = core station, RS = remote station):

• CS004 • CS006 (close to CS004) • RS205 (electric fence to keep sheep in the eld) • RS306 • RS208 (far away from the core) 44

6.3. Results for the post correlation algorithms

In the Figure 6.1 we can see the visualized correlated data in the time / frequency domain. The red part represents clean data, while the white part represents RFI detected by oine reference agger on complete dataset. As it has a lot of RFI, we will use this Figure as a reference for estimating the power of post correlation algorithms in whole this section.

Figure 6.1: Visualization of the LOFAR data on the model airplanes subband - reference Blue points in each of next gures represent samples marked by the particular algorithm described in the Figure's caption. In the Figure 6.2 we can see the results obtained by the Threshold Flagging Algorithm with threshold level set to 0.005. As the big part of RFI is correctly marked, there is a huge amount of false positives, that disqualify this algorithm for practical use. In the Figure 6.3 we can see the results obtained by the Threshold Flagging Algorithm with threshold level set to 0.5. In this case, there are no false positives, but the amount of correctly marked RFI sharply dropped. Therefore, threshold level set to a value that high also cannot be used in practice. In the Figure 6.3 we can see the results obtained by the Threshold Flagging Algorithm with threshold level set to 0.05. Right now, the amount of correctly marked samples is almost the same as in case of threshold level set to 0.005. Still, there are no false positives. That makes 0.05 the most suitable choice for the threshold level in the Threshold Flagging algorithm. In the Figure 6.5 we can see the results obtained by this algorithm on the clean subband for a longer period of time. Also in this case, the algorithm does not produce any false positives, while signicant part of RFI is marked. In the Figures 6.6 - 6.8 there are results obtained by the Var Threshold Flagging Algorithm. Threshold level in each case is set to lower than 0.5, as for this level even the Threshold Flagging algorithm produces no false positives. 45

Figure 6.2: Visualization of the Threshold Flagging results with threshold = 0.005

Figure 6.3: Visualization of the Threshold Flagging results with threshold = 0.5

46

Figure 6.4: Visualization of the Threshold Flagging results with threshold = 0.05

Figure 6.5: Visualization of the Threshold Flagging results with threshold = 0.05 (clean subband)

In the Figure 6.6 we can see the results obtained by the Var Threshold Flagging Algorithm with threshold level set to 0.01 and window size set to 4. There are almost no false positives, but the amount of correctly agged samples is lower than the number achieved by the Threshold Flagging algorithm with threshold level set to 0.05. Unfortunately, as can be seen in the Figures 6.7 and 6.8 decreasing the threshold value only increases the false positives ratio, while the amount of correctly marked samples stays constant. 47

Figure 6.6: Visualization of the Var Threshold Flagging results with threshold = 0.01 and window size = 4

Figure 6.7: Visualization of the Var Threshold Flagging results with threshold = 0.001 and window size = 11

48

Figure 6.8: Visualization of the Var Threshold Flagging results with threshold = 0.0005 and window size = 15 6.4. Results for the pre correlation algorithms

As no tool for visualizing an uncorrelated data have been created, for the algorithms working on precorrelated samples we provide deep analysis of results based on percentages of marked samples for each algorithm. By having carefully chosen subbands, we can estimate the power of the algorithms. Figures 6.9 - 6.16 show the percentage of marked samples depending on the algorithm parameter. Each line represents one subband. As the impact of RFI on the clean subband should be relatively small, we can look at the percentage of marked samples on the clean subband as a false-positive ratio. We can see that increasing the threshold (or window size) decreases the false-positive ratio, and it increases the false-negative ratio. Samples from TV subbands, that obviously should be marked and are marked by algorithms with lower threshold level, are not marked anymore when we increase those values. As we can see in the Figure 6.10, some algorithms can have a large false-positive ratio in some particular cases. In this case, model airplanes emit very short, but relatively strong waves. If we mark not only sample, which is above the threshold, but also samples that are before and after this sample, we drastically increase the false-positive ratio. Figures 6.12 and 6.14 show the increasing window size even to small values like 5 decrease percentage of marked samples to the level of statistical error. Only in the model airplanes subband, some RFI is spread over frequencies in a single time moment. That makes those algorithms not very useful. Sum Threshold marks a lot more samples than Var Threshold. Many of those samples are false-positives - one sample with extremely high value aects marking of samples next to it. In the Figures 6.15 and 6.16 we can see results obtained by the Auto Threshold Blanking 49

Figure 6.9: Percentage of marked samples by the Threshold Blanking algorithm algorithm. As the percentage of marked samples is almost the same for all the subbands and does not dier across the stations, none of the variants can be used in practice for determining the threshold. Results from APB algorithm are not shown, as for the frequency domain it seems to be completely useless. Flagged data seems to be completely random, with the same amount of agged samples for each station/subband.

6.4.1. Frequency/Time row average Currently in the LOFAR software there is a possibility to mark in a chunk of data as invalid only whole time/frequency rows. To see if there is a need of implementing a mechanism to mark only a few particular samples we have created a mechanism to check how many samples are marked in a row, that contains at least one marked sample. As samples after correlation are integrated over time, the results are relevant only for the pre-correlation algorithms. In the Figure 6.17 we can see how many samples with dierent time stamps and same frequency channel are marked, for those frequencies that have at least one sample marked as invalid. In the Figure 6.18 we can see how many samples from dierent frequency channels with the same timestamps are marked, for those timestamps that have at least one sample marked as invalid. 50

Figure 6.10: Percentage of marked samples by the Parametrized Threshold Blanking algorithm As we can see on both gures, only in the model airplanes subband, RFI is concentrated in particular places. In that case a loss of good samples while marking the whole frequency/time row may be acceptable. In other cases, the percentages are between 0 and 10 percent, so we would mark more than ten times more good samples than bad samples. 6.5. Results for the station detectors

Tables 6.1 and 6.2 presents data obtained by the Pre Correlation Station Detector for three dierent subbands:

• The radio subband. • The clean Subband. • The TV Subband. A rst row of each table represents the aggressiveness of algorithms, a rst column represents the percentage of samples, that have to be corrupted to mark the station as misbehaving. Each eld in the table has three values for the radio, clean, and TV subband, respectively. In the table 6.1 we can see the percentage of chunks of data that have been marked for the given data. As all the stations were behaving correctly, we can look at the values in the 51

Figure 6.11: Percentage of marked samples by the Var Threshold Blanking algorithm with window size = 3 table as the false positives ratio. As we can see, values in the left top corner of the table, with aggressiveness around 1.0 and required percentage of marked samples around 10, are close to 100, so usefulness of them is close to 0. On the other hand, the algorithm with aggressiveness around 2 and required percentage of samples above 30 had no false positives. Parameters 10 20 30 40

1.25 100 / 100 / 100 60 / 99 / 100 20 / 20 / 19 20 / 7 / 0

1.50 80 / 99 / 100 20 / 20 / 10 19 / 3 / 0 2/0/0

1.75 20 / 19 / 3 11 / 1 / 0 0/0/0 0/0/0

0 0 0 0

2.00 /0/ /0/ /0/ /0/

0 0 0 0

0 0 0 0

2.25 /0/ /0/ /0/ /0/

0 0 0 0

Table 6.1: Pre Correlation StationDetector - percentage of marked stations on clear data To test the algorithm, we have created the mechanism of injecting articial disturbance to samples from the given station. Two dierent approaches have been used:

• Multiplying all the samples from the particular station by a given constant factor. • Multiplying all the samples from the given station by a random number in (−f, f ), where f is a given factor. 52

Figure 6.12: Percentage of marked samples by the Var Threshold Blanking algorithm with threshold = 1200 The set of tests using both approaches have been performed, with parameters ranging from 2 to 8. Table 6.2 contains results of the tests. Values in the table represents percentage of articially disturbed chunks of data, that have been correctly recognized as a result from misbehaving station. Parameters 10 20 30 40

1.25 100 / 100 / 100 100 / 100 / 100 100 / 100 / 100 50 / 85 / 82

1.50 100 / 100 / 100 100 / 100 / 100 50 / 80 / 79 50 / 50 / 50

1.75 100 / 100 / 100 50 / 50 / 50 50 / 50 / 50 50 / 50 / 50

0 0 0 0

2.00 /0/ /0/ /0/ /0/

0 0 0 0

0 0 0 0

2.25 /0/ /0/ /0/ /0/

0 0 0 0

Table 6.2: Pre Correlation Stations Detector - percentage of marked disturbed chunks of data As we can see, for the aggressiveness set to 1.5 and the required percentage of samples set to 20 all the disturbed stations were recognized correctly. Unfortunately, the false positive ratio in this case is still signicant. Still, if the quality of astronomical data for particular observation is crucial, we can aord a loss of some stations that behaves relatively bad to get completely clean output. For the aggressiveness set to 1.75 and the required percentage of samples set to 30 we still recognize correctly 50 percent of articial disturbance with no false positives at all. If we 53

Figure 6.13: Percentage of marked samples by the Sum Threshold Blanking algorithm with window size = 3 cannot aord a loss of good data, it seems to be the most suitable choice. Unfortunately, results obtained for the Post Correlation Stations Detector were not satisfying - either all the stations were marked or none of them. 6.6. Processing time

In the Figure 6.19 we can see the processing time for one subband and 5 stations for the precorrelation algorithms. As LOFAR will have approximately 64 stations, those results should be multiplied by a factor of 13. Still, tests were performed on the test server, so they can dier on the BlueGene/P supercomputer. We can see that the processing time for the 3 rst algorithms are very similar and are around 15 ms - so for 64 stations it would be around 200 ms. The Sum Threshold Blanking algorithm is more computationally expensive - processing time is around 3 times bigger than in the other cases, while results are very similar (or worse) to the Var Threshold Blanking algorithm. The cost of adding a dynamically calculated threshold can be seen on the Auto Threshold Blanking. While results are highly unsatisfying, the processing time of calculation of mean and standard deviation makes the algorithm twice as expensive as in standard case. Median variant is even more expensive. The APB algorithm, which works on norms, is extremely 54

Figure 6.14: Percentage of marked samples by the Sum Threshold Blanking algorithm with threshold = 2000 * window size

expensive. The PreCorrelation Stations Detector algorithm is very fast, as calculating mean and standard deviation across few stations is not computationally expensive. Therefore, according to the good behavior of this algorithm it is recommended to integrate this solution with existing software. If we want to create our own mechanism of determining the threshold, the Figure 6.20 gives us an answer about time needed for calculating essential statistics using built-in solutions. As we can see, the time needed for calculating an average is around 5 ms - it is less than a half time needed to perform simple thresholding, but the time needed for calculating the median is 6 times bigger than that - the selection algorithm is computationally expensive [13]. Calculating the standard deviation is only two times more expensive than calculating the average value, but the average value has already to be calculated. In Figure 6.21 we can see the processing time for postcorrelation algorithms. They look very low, but it grows squared with the number of stations. To get the processing time for 64 stations we have to multiply those results with a factor of 200, achieving results only a little bit lower than in precorrelation case. 55

Figure 6.15: Percentage of marked samples by the Auto Threshold Blanking algorithm (average variant) 6.7. Results of comparisons

6.7.1. Comparisons between precorrelation algorithms Results of comparisons between precorrelation algorithms show that algorithms were designed to nd dierent types of RFI. Comparisons between algorithms of the same type with dierent parameters can be omitted - it is obvious that algorithm with lower threshold mark all the samples marked by algorithm with higher threshold plus some additional samples. In Table 6.3 we compare the algorithms. The value in column A and row B tells us what the average percentage of samples marked by the algorithm A, is that have been marked by the algorithm B. Threshold Parametrized Threshold Var Threshold Sum Threshold

Threshold 40 85 55

Parametrized Threshold 85 90 83

Var Threshold 29 15 60

Table 6.3: Comparison between precorrelation algorithms - TV Subband 56

Sum Threshold 60 37 90 -

Figure 6.16: Percentage of marked samples by the Auto Threshold Blanking algorithm (median variant) As we can see in the table, none of the algorithms marks more than 40 percent of samples marked by the Parametrized Threshold Blanking algorithm - the false positive ratio of this algorithm is very high. The Var Threshold Blanking algorithm is specialized in nding RFI, which is spread over multiple frequencies, so it marks only 30 percent of samples marked by the simple threshold algorithm - the false negatives ratio in case of short single bursts for the Var Threshold Algorithm is very high. On the other hand, almost all the samples marked by the Var Threshold Algorithm are marked by the other ones. The Sum Threshold algorithm marks 90 percent samples marked by the Var Threshold. In the same time, it marks 60 percent samples marked by the simple threshold - twice as more as the Var Threshold Blanking. It means that by having similar true positives ratio, the false negatives ratio of the Sum Threshold Blanking is lower. The table 6.3 presents the results from the TV subband, but results from the other subbands were comparable.

6.7.2. Results obtained by the feedback mechanism The results using the feedback loop were highly unsatisfying - the percentage of samples marked by the feedback mechanism in the data before correlation were very low, even for 57

Figure 6.17: Percentage of marked samples in a time row high percentages of samples marked in the correlated data (around 10 percent). Checking if the samples from all the baselines including particular station are marked as RFI in the data after correlation is not a good criterion for the feedback mechanism. For example, the feedback mechanism for the Threshold Blanking algorithm with threshold level set to 0.005, that gives huge amount of false positives (See section 6.3), marked only 0.3 percent of samples in uncorrelated data. Keeping in mind, that because of the data integration we mark only whole time rows, this amount is extremely small. We have tried to change the mechanism that we could achieve higher percentage of marked samples. The feedback mechanism was changed to less strict - to mark the sample in uncorrelated data as invalid only a part of the baselines including this station has to be marked as invalid by post correlation algorithm. Still, even after setting the required number of aected baselines to 2, the valuable Threshold Blanking algorithm with threshold level set to 0.05 marked less than one percent of uncorrelated samples. Therefore, the feedback mechanism in its current state cannot be used for the LOFAR telescope.

58

Figure 6.18: Percentage of marked samples in a frequency row

59

Figure 6.19: Processing time for one subband and 5 stations - Precorrelation Algorithms

60

Figure 6.20: Processing time for one subband and 5 stations - Statistics

61

Figure 6.21: Processing time for one subband and 5 stations - Postcorrelation Algorithms

62

Chapter 7 Conclusions

7.1. Recommendation for LOFAR telescope

As some of the science pipelines in the LOFAR telescope do not use correlated data at all and the feedback mechanism is highly unsatisfying, implementation of at least one of the algorithms that works on the data before correlation is essential. As compute nodes get chunks with only one second of data, and each time they can have dierent subband, we have to keep in mind, that we should be focused more on the frequency domain. Therefore, we have:

• The APB Algorithm - it was designed to work in time domain, and according to the results it is not applicable into frequency domain (See Section 6.4). • The Parametrized Threshold Blanking algorithm - it has very high false-positive ratio (See Figure 6.10). • The Var Threshold Blanking algorithm and the Sum Threshold Blanking algorithm - in the frequency they are supposed to detect RFI which is spread over multiple frequency channels, but according to the results, increasing the window size to values more than 5 drastically decreases the number of marked samples (see Figures 6.12 and 6.14), so they do not fulll their primary goal. • Threshold Blanking Algorithm - it detects most of the data marked by other algorithms and has very low false-positive ratio (See Section 6.4). According to the facts presented above, the simple Threshold Blanking Algorithm seems to be the most suitable choice for the online LOFAR RFI mitigation on data before correlation. As the algorithm is very simple, it can be easily implemented on the BlueGene/P supercomputer in a very eective way. Unfortunately, the time / frequency row average is very low (See Section 6.4.1), and marking the whole time or frequency row would be a big loss of a good data. Therefore, a mechanism that will make marking particular samples possible has to be implemented as well. The threshold for the algorithm can be set as a constant at a level around 1200, where false positive ratio seems to be very small, and most of the data inuenced by the RFI is removed. Because the algorithm is simple, we can also aord dynamic determining of the threshold level, using the implemented methods getMedian(), getAverage() and getStdDev(), but as the existing methods seems not to be accurate (See Figures 6.15, 6.15), the new ones have to be developed. 63

As a tool for recognizing misbehaving stations we recommend to implement Pre Correlation Station Detector - the results for this algorithm were very promising (See Section 6.5) and the algorithm is very ecient. Depending on a desired true/false positives ratio we can choose the most suitable parameters basing on the achieved results. As the results obtained for the post correlation algorithms are not outstanding (See Section 6.3) and online RFI is essential mainly for the pipelines, that do not use the correlated data, such as beam forming mode, and the feedback mechanism is not working as expected, we do not recommend any algorithm that works on post correlation data. 7.2. Future Work

Future work includes:

• Implementing the recommended algorithms on the BlueGene/P supercomputer, possibly in assembler language. • Implementing a mechanism in the software correlator for marking particular samples (and not just rows / columns). • Research about methods of dynamic determining the threshold levels. • Adding more algorithms to the RFI Processing Library - both working on data before and after correlation.

64

Bibliography

[1] Wikipedia, http://en.wikipedia.org/wiki/ [2] Miller, Diane F. (1998). Basics of Radio Astronomy for the Goldstone-Apple Valley Radio Telescope. http://www2.jpl.nasa.gov/radioastronomy/ [3] Burke, Bernard F.; Graham-Smith, Francis (2002). An Introduction to Radio Astronomy (2nd ed.). Cambridge University Press. [4] Felli, M.; Spencer, R. E. (1989). Very long baseline interferometry. Techniques and applications. Kluwer Academic Publishers. [5] Thompson A.R.; Moran, J. M.; Swenson Jr. G. W. (2001) Interferometry and Synthesis in Radio Astronomy (2nd ed.). John Wiley & Sons, Inc. [6] Romein, J.W.; Broekema; P. C.; Mol, J. D.; van Nieuwpoort, R. V., The LOFAR Correlator: Implementation and Performance Analysis, ACM Symposium on Principles and Practice of Parallel Programming (PPoPP'10), Bangalore, India, pp. 169-178, January, 2010. [7] McClellan J.; Schafer R.; Yoder M. (2003) Signal Processing First. Pearson Education Inc. [8] Oringa, A.R.; de Bruyn, A.G.; Biehl, M.; Zaroubi, S.; Bernardi, G.; Pandey, V.N. Post-correlation radio frequency interference classication methods. Astronomy & Astrophysics 378, 327344 (2001). [9] Kesteven, M. The Current Status of RFI Mitigation in Radioastronomy. http://www.atnf.csiro.au/people/Michael.Kesteven/papers/ [10] Baan, W.A.; Fridman, P.A.; Millenaar, R.P.; Radio Frequency Interference Mitigation at the Westerbork Synthesis Array: Algorithms, Test observations and System Implementation, The Astronomical Journal, 128:933949, 2004 August [11] Kesteven, M.; Hobbs, G.; Clement, R. Adaptive Filters Revisited - RFI Mitigation in Pulsar Observations. Radio Science, Vol. 40, RS5S06, 10 pp., 2005 [12] Deller A. Software correlators as testbeds for RFI algorithms. RFI Mitigation Workshop, 29 - 31 March, 2010, Groningen. [13] Cormen, T. H.; Leiserson E. C.; Rivest R. L.; Cliord S. (2001) Introduction to Algorithms (2nd ed.). The Massachusetts Institute of Technology. [14] Niamsuwan N.; Johnson J.T.; Ellingson S.W. Examination of a simple pulse blanking technique for RFI mitigation. Radio Science, Vol. 40, RS5S03, 11 pp., 2005 65

[15] Boost library documentation, http://www.boost.org/doc/

66