Data Structure to Store GTFS Data Efficiently on Mobile Devices

JOURNAL OF COMPUTER SCIENCE AND SOFTWARE APPLICATION Volume 1, Number 1, September 2014 JOURNAL OF COMPUTER SCIENCE AND SOFTWARE APPLICATION Data St...

Author: Arron McDonald

7 downloads 0 Views 368KB Size

Report

Download PDF

Recommend Documents

Application for Data De-duplication Algorithm Based on Mobile Devices

Examination of Mobile User Behavior Using Data Mining techniques on App Store Data

PRACTICAL MANUAL ON DATA STRUCTURE

Exploiting Replication and Data Reuse to Efficiently Schedule Data-intensive Applications on Grids

Data Dissemination to Mobile Users

O Devices. Data Sheet

Efficiently Publishing Relational Data as XML Documents

ACQUISITION AND ANALYSIS OF PERFORMANCE DATA FOR MOBILE DEVICES

Handling large amount of data efficiently

Phishing on Mobile Devices

Proxy Server Based Data and Service Accessing in Mobile Devices

Clustering Performance Data Efficiently at Massive Scales

Introduction to ADTs Abstract Data Types. Data Structure. Data Structures again. Abstract Data Type

Streaming to Mobile Devices

Simple Generic Data Structure

Programming and Data Structure

Data Structure Using C

Managing Mobile data growth

Mining Protein Structure Data

USB Data Packet Structure

Data Structure (C++ Version)

Simple Generic Data Structure

upgradable on some mobile devices?

XML encoding techniques for storing XML data on memory limited (mobile) devices. David A. Lee

JOURNAL OF COMPUTER SCIENCE AND SOFTWARE APPLICATION Volume 1, Number 1, September 2014

JOURNAL OF COMPUTER SCIENCE AND SOFTWARE APPLICATION

Data Structure to Store GTFS Data Efficiently on Mobile Devices Tamas Szincsak* , Aniko Vagner Faculty of Informatics, University of Debrecen, Hungary *Corresponding author: [email protected]

Abstract: GTFS (General Transit Feed Specification) feeds are well-known and useful datasets that describe transit data of public transport agencies. They are used by many applications for various purposes, but on today’s mobile devices, these applications cannot work efficiently with the GTFS feeds without an internet connection and server-side processing. After a short overview about applications using GTFS, we introduce our data structure. The goal of our data structure is to support storing and using GTFS feeds on mobile devices in an efficient way. Databases that are built using our data structure are small and compact. We differentiate the logical data structure and its physical representation. The logical data structure is easy to understand and it is the first step of transforming the data into the physical representation. We give three approaches for the physical representation: object-relational database, C structures with memory snapshot and protocol buffers. We explain in detail the C structures with memory snapshot and we introduce how it is generated from the logical data structure. Finally we compare the size of the GTFS feeds and our databases for some cities. Keywords: Data Structure; GTFS; Memory Snapshot; Mobile Application; Protocol Buffer

1. INTRODUCTION Nowadays, many people use smart mobile devices, thus the demand for applications running on those devices is increasing. People need information systems which help their everyday life. However, not every smart device user has an internet connection, or if they have it, it does not always work in a satisfying way during traveling. Transit agencies can publish timetable information in GTFS (General Transit Feed Specification) format [1]. GTFS structure follows the logic of thinking of transit agencies, so it’s relatively easy for them to produce a database that is compliant with the GTFS specification. However, this structure allows the data to be stored in a redundant way and different transit agencies also tend to interpret some parts of the specification differently, which makes processing the database harder. Because of these reasons, storing and processing of a GTFS database on mobile devices is slow, inefficient and difficult. In this article we introduce a data structure, which can store the data of GTFS databases, but it is more 27

JOURNAL OF COMPUTER SCIENCE AND SOFTWARE APPLICATION

suitable for mobile applications.

2. PUBLIC TRANSIT EXCHANGE SPECIFICATIONS You can find many public transit exchange specifications in the market. One example is TRANSMODEL [2], which is a reference data model for Public Transport operations and developed within several European projects. It is used also in a Hungarian project [3]. Another example is TransXChange [4] which is the UK nationwide standard for exchanging bus schedules and related data. The third example is the GTFS (General Transit Feed Specification) which will be introduced in the next section. We can say that GTFS is the most used, widespread and well-known public transit exchange specification. The other two are used only for a few projects.

3. GTFS “GTFS ”feeds” allow public transit agencies to publish their transit data and developers to write applications that consume that data in an interoperable way.” [1] There are many reasons, why we have worked with the data of the GTFS databases. The first is that it created and supported by the Google, which is a huge and stable company. This fact means for us that the GTFS will probably be used, supported and developed for many years, so it is worth to work with it. Another reason is that many open source applications work with GTFS databases. Very important for us that the GTFS databases can be easily acquired, it means that you can download a database in a simple zip file. Our transit schedule application works in cities in Hungary. GTFS database can be downloaded for Budapest from website of the public transport company of the city: http://www.bkk.hu/gtfs/budapest gtfs.zip. GTFS database of two other Hungarian cities (Debrecen and Ny´ıregyh´aza) are generated by a Hungarian organization which website is http://www.derke.hu/. But we take not only Hungarian cities into consideration. Our data structure works for every city where the GTFS database is accessible. Public GTFS data can be downloaded from the websites of transit agencies. You can find a collection of them on the following links: http://www.transitfeed.com and http://www.gtfs-data-exchange.com/agencies. At last there is another reason why we use the GTFS structure: it is simple, unified, easy to understand and open source.

4. RELATED WORK There are many application which uses GTFS. Antrim, Barbeau [5] grouped them, and shows examples for each groups. The groups, and their applications are:

28

• Trip planning and maps (Google Maps, OpenTripPlanner, Bing Maps)

Data Structure to Store GTFS Data Efficiently on Mobile Devices

• Ridesharing (Parkio, Avergo, etc) • Timetable creation (Timetable Publisher) • Mobile applications (Google Maps, Transit App for iOS 6 and Beyond, etc) • Data visualization (Walk Score and Apartment Search feature, Mapnificent) • Accessibility (Sendero Group BraileeNote GPS, Travel Assistant Device) • Planning & analysis tools (OpenTripPlanner, Graphserver, etc) • Interactive Voice Response (IVR) • Real-time transit information (OneBusAway, NextBus, etc) One of the well-known transit travel information systems is OneBusAway [6]. Among its other services it provides static route maps. Most services of OneBusAway need internet connection. They planned to gather statistics by mobile devices and provide online information for the users. Another trip planner system is the Travel Assistant Device [7]. It also needs internet connection for communicating with a server-side application. A lot of systems which are collected by Antrim and Barbeau [5] are desktop applications, which has enough memory and processor capacity to process the GTFS databases. Other applications works on mobile applications but most of them need internet connection, so they can connect to a server-side application, which processes the GTFS database. The review is not complete. The topic is very huge, and a lot of applications are not documented appropriately. From this short summarizing you can see that there are not many mobile applications which serve transit schedule information without internet connection. The reason can be that the memory and the processing capacity is limited. Our new data structure gives solution for the memory part of the problem. The GTFS structure is interesting in the viewpoint of database conception. Braga et al. [8] summarizes the taxonomy of the GTFS and introduces the GTFS Entity-Relationship model in their article. Gerstle [9] built a relational data model based on the GTFS structure. Bick and Damphouse [10] developed a Python application which converts the GTFS database to relational database, exactly SQL statements which build the relational database. We use similar data models to introduce our data structure for mobile devices.

5. OUR DATA STRUCTURE FOR MOBILE DEVICE Our goal was to develop a small, compact and easy to navigate data structure to store the GTFS data on mobile devices. The data structure can be used in transit schedule, trip planner, etc. mobile applications. It causes no problem if the device has no internet connection. We made distinction between the logical structure and its physical representation. We considered the following aspects when we planned the logical structure: • it has to be independent from the agencies (the agencies can understand the GTFS structure in many ways);

29

JOURNAL OF COMPUTER SCIENCE AND SOFTWARE APPLICATION

• developers, who donot deal with public transport, should understand it and work with it very easily; • it should provide an obvious hierarchy for the most frequent queries (so it stores normalized, aggregated data); • it should be able to store the public transport network of a city with at least the same level of detail as present solutions Regarding the physical representation our goal was that the structure should be small, compact; it has to be loaded only with a minimal overhead; and can be processed very quickly. In the next sections we introduce the logical structure and its potential physical representations with their advantages and disadvantages.

5.1 Logical Structure We can consider the logical structure as an object-relational data model. There is only one difference from the relational data model: some fields are arrays. The main difference between our structure and both GTFS and the relational model of Gerstle [9] is that instead of storing the list of stops for every trip, and the arrival and departure times for every stop on each trip, we group trips to lines by their stops, and we only store the following information: • the list of stops for every line, • all the possible values for the time needed to reach all stops on a line from the first stop (journey time), • the departure time from the first stop for every trip, and the index of the journey time to use. On the Figure 1 you can examine its tables and the relations between them. We give a short description about the tables. The columns marked by asterisk of each table constitute the primary key of the table. The ROUTES table has a row for every route known by the users. The name of the route has to be unique. It is the number of the route, which is used in the everyday life. The ROUTE LINES table stores information about the lines of each route. The lines belonging to one route has the same name or similar name, but the stops of the lines or the order of the stops can be different. For example in Budapest the tram route with number 18 has 4 lines: one goes from Savoya Park to Szell Kalman square, the second goes backward, the third goes from Ujbuda to Szell Kalman square and the fourth backward. The ROUTE CATEGORIES table is used for grouping routes using some similarities (e.g. by districts, type of vehicle, etc.). The groups are not defined in advance, they can be arbitrary, but they should be useful to the user. This data is stored only to make searching for a specific route easier. The STOPS table gives many information about the stops. Every stop has a row in the table. The names of the stops can be the same, but it is recommended to give sub name which make a difference between them. The STOP GROUPS groups the stops belonging to an interchange, a station, a junction with more than one stop, or it can group related stops. 30

Data Structure to Store GTFS Data Efficiently on Mobile Devices

Figure 1. The Logical Structure

The SHAPE DATA field of the SHAPE SEGMENTS table stores GPS coordinates in an array between two neighboring stops. In this way we store the exact path of the vehicles. One shape can belong to more than one line. The TRANSFERS table contains information for potentially ambiguous pair of stops. Generally, the transfer time between most stops can be calculated. However, there are a few neighboring stops which are physically close to each other, but walking between these stops takes more time than usual. For example there can be obstacles or traffic lights between the two stops. The TRANSFERS table contains such stops with their practical transfer time. The LINE STOPS table determines which stops belong to one line. It also determines the stop sequence of the lines. The ARRIVAL TIMES and DEPARTURE TIMES fields are arrays. An element of the first array shows when the vehicle arrives to this stop after the departure from the first stop. Similarly, an element of the second array shows the departure time. The HEADSIGN field of the table shows the destination of the vehicle, and it needs to be filled for the first stop, and all other stops where the text on the screen of the vehicle changes. The table also stores the distance traveled from the first stop in meters. The DAY TYPES table contains arbitrary groups for days, for which a set of trips are available. These groups does not have to match groups used by people in everyday life (e.g. workdays, holidays, etc.). The exact dates of a period can be simply listed or a mask can be used for each day type depending on the implementation. The LINE DEPARTURE table stores when each line starts. The TIME OFFSET field contains the 31

JOURNAL OF COMPUTER SCIENCE AND SOFTWARE APPLICATION

starting time of a line from midnight in seconds (applications can convert it to hour, minute and second to show the user the exact value). In the FLAGS field you can store additional information about the vehicle, for example whether passengers with wheelchair can travel on it. The TIME ID is the index of the journey time information in the ARRIVAL TIMES and DEPARTURE TIMES fields of the LINE STOPS table. The vehicles of a line can travel sometimes slower, sometimes faster, that does not depend on the day type. The CALENDAR OVERRIDE contains public holidays, irregular day offs, irregular workdays. The CALENDAR SCHOOL HOLIDAYS table is similar to the previous one, but it stores the days of the school holidays. These two tables are not used for calculations, their only purpose is to provide richer information to the user. They can be omitted, if the application does not require these kind of information. The METADATA table stores information about the database. The VALID FROM and VALID TO columns give information in which period the database can be used. Some tables contain a field named FLAGS which is used for storing miscellaneous boolean values. The meaning of each bit in these fields is different from table to table. For example, the FLAGS field for routes can store whether passengers are allowed to use all doors for getting on, or whether they have to purchase a special ticket to use that route. For lines, it can store whether that line goes directly to the depot. For departures, it could be used to store information for a given vehicle, e.g. wheelchair accessibility. For stops, it can store whether the stop is under the ground or if it can be reached using wheelchairs.

5.1.1 Generating the Relational Database from GTFS Structure The transformation to the relational database is done by a Java application, which reads the whole GTFS database into the memory and executes the steps described below. The data structure was designed to be used in our trip planner applications [11]. One trip planner application works on only one city or one agency. This is the reason that content of the AGENCY.TXT, FEED INFO.TXT files are stored not in the data structure but in the application. If you need to store data of more agencies, you can create an AGENCIES table, which stores the necessary information. In this case the ROUTES table need to contain a reference to the AGENCIES table. The stops in the STOPS.TXT file location type field of which is blank or zero are transformed to STOPS table, whereas stations are transformed to STOP GROUPS. If a stop does not have an associated station, it gets grouped using K means clustering (possibly also taking the similarities in their name into account), and these groups also form a row in the STOP GROUPS table with calculated data. The street field of the STOPS table is gathered from OpenStreetMap (http://www.openstreetmap.org/). The minimum of the start date and the maximum of the end date field in the CALENDAR.TXT file determines the interval for which the GTFS feed is valid, and this data is stored in the VALID FROM and VALID UNTIL fields of the METADATA table. The set of days for which a given service calendar is effective are calculated for every record in the CALENDAR.TXT file (by also taking exceptions from CALENDAR DATES.TXT into account). Rows with the same set of days are merged together and they are stored in the DAY TYPE table. The ROUTES.TXT file is converted to the ROUTE table. If two routes have the same name, description and color, they are merged together. If there are routes without any trip, they are deleted. The routes are manually grouped based on various features which depend on each city. The groups are transformed to rows in the ROUTE CATEGORIES table. 32

Data Structure to Store GTFS Data Efficiently on Mobile Devices

We process the TRIPS.TXT file by routes. The first step is to determine the correct direction id for every trip. This step is needed because in some cases the direction id field in the file was not used or was not correct for some trips. The trips which travel to nearly the same direction get the same direction id. When the vehicle goes round it gets 0 as direction id. The second step is to concatenate trips if they satisfy some specific conditions. If the last stop of an A trip is the same as the first stop of a B trip; the B trip departs only a little time after the arrival of the A trip and they have the same block id and direction id, we concatenate them together. This means that the passengers can travel on the A and B trip without transferring, so this is a whole line of a route. Similarly, if a vehicle spends more time at a stop than what seems sensible for the passengers, we split the trip at that stop into two distinct trips. In the third step the generated new trips are grouped based on the list of their stops (which is determined from the STOP TIMES.TXT file). Two trips are in the same group if they have the same stops in the same order. These groups form the rows in the ROUTE LINES table. For every route line, we store the list of stops in the LINE STOPS table. Every trip assigned to the same line travels through the same stops, but the time needed to travel through them might be different. We calculate all the possible journey times for every line, and store this information in the ARRIVAL TIMES and DEPARTURE TIMES fields of the LINE STOPS table. For every trip in the TRIPS.TXT we insert a new record to the LINE DEPARTURES table. The TIME ID field stores an index which determines the journey time being used by this route (thus, which element of the ARRIVAL TIMES and DEPARTURE TIMES arrays should be accessed for each stop to get the relative time needed to reach that stop). The information in these tables is enough to calculate the location of any vehicle at a given time, without storing all data explicitly in the database. The SHAPES.TXT file is used to generate the SHAPES table. The original GTFS file contains shapes of the routes. This means that it stores neighbouring stops more than once. Generally we can assume that the vehicles travel on only one path between two neighbouring stops. Practically the feeds that we worked on only contained a few differences among the paths between two neighbouring stops and these differences were not more than few meters (e.g. when the tram and the tram substitute bus uses the same stops but does not use the same lane). So we store only one shape between two neighbouring stops. This conversation has an unimportant minimal loss, but it causes a significant compression. To generate the rows in the SHAPES table, we split the original polyline using the coordinates of the stops, and we compress the polyline segments using the Ramer–Douglas–Peucker algorithm. The TRANSFERS.TXT is used to generate the TRANSFERS table. A row is inserted into the table if the transfer is not possible between the stops or the transfer requires a minimum amount of time. Our data structure does not handle the timed transfer points, because this concept does not exist in the cities we worked on. If the values of the exact times field of FREQUENCIES.TXT file is 1, the data is converted to trips. The zero value means that the trip is not exactly scheduled. Our data structure cannot handle this case, but practically we didnot meet such feed. We do not use the FARE ATTRIBUTES.TXT and the FARE RULES.TXT files, because the cities usually have complex fare rules which is hard to represent in this format, so this information is often missing from GTFS databases. Instead, we decided to implement this functionality in the application.

33

JOURNAL OF COMPUTER SCIENCE AND SOFTWARE APPLICATION

5.2 Physical Representation There are more than one opportunity to realize the logical structure. We have to consider that we want to store and use it on mobile devices. In this article we show three solutions: the object-relational database, the C structures with memory snapshot and the protocol buffers. The main advantage of the object-relational database is that it is well-known, you can use many tools to work with it, you can use optional fields, and the data structure can be modified easier than the binary format. It is the most straightforward solution, but it has some disadvantages. The size of the database is large for a mobile device. Devices serve the results of queries slowly. These reasons cause that its realization for mobile devices does not support processing of complex functions. Our application, which uses the data structure, serves offline trip planner. It needs quick data processing, so this representation is not a good choice for this function. The C structures with memory snapshot is a binary format, a serialized data structure. The advantages of it that it is a very small database, and the application which uses it can be also very small, and very quick. The disadvantages of this structure are that it is fix, difficult to modify. Changing the data structure implies that the program code also has to be modified. Our application uses the C structures with memory snapshot, because it gives the best performance for an offline mobile application, which has limited memory and processing capacity. The storing using protocol buffers is similar to the binary format, but it contains metadata. This means the data structure can easier be modified with augmentation of optional fields. An advantage of the structure that the augmentation of the optional fields does not require to change the program code. In this way this is a flexible structure, but you cannot do radical remodeling on the structure. It is binary so the size of it is smaller than the relational database. The application which uses this structure has to contain generated code, so it is larger and slower than C structures with memory snapshot. By our application we do not need the opportunity that the data structure can be modified without modifying the program code.

5.2.1 C Structures The binary file is created by a dedicated C++ application from the relational database.You can find the C code for the data structure in the appendix. Our trip planner application uses this data structure. The goal of our data structure is that the more times an entity occurs in the dataset, the least space should be needed to store it. Because of this, we split the tables in the object-relational model into two categories. The first category includes the following tables: CALENDAR OVERRIDE, CALENDAR SCHOOL HOLIDAY,DAY TYPES, ROUTE CATEGORIES, ROUTES, ROUTE LINES, STOP GROUPS, and STOPS. These tables contain relatively few rows, and they are simply mapped to arrays of structures with similar names. Each structure have a field named ID which contains a unique identifier starting from one, and increasing by one for each entry. These ids match the position of the corresponding element in the appropriate array. Other tables are handled differently. LINE STOPS is broken into three structures. The contents of the HEADSIGN field is stored in LineHeadsignEntry-s, because this field is NULL for most stops. We chose not to store the different arrival and departure times not for each stop, but to store the different set of travel times for each line, using LineTravelTimeEntry-s. This method does not require less space, but this 34

Data Structure to Store GTFS Data Efficiently on Mobile Devices

Figure 2. the Data in the Memory

way closely related data is stored in a same array, which helps compressing the database in a later step. Other fields in the LINE STOPS table are stored using LineStopEntry-s. The LINE DEPARTURES table is grouped by all fields, except time offset. These groups are stored as LineDepartureGroupEntry-s, whereas the time offsets for each group are stored as an array. This method allows us to store the data efficiently, because there are significantly fewer groups than the count of rows in the original table. It also helps making the queries faster, because the day type and flags fields does not need to be compared to some specific value for each row. The elements of the time offsets field are ordered, so finding departure times near a given time is possible without scanning the whole array. Shapes and transfers are sorted by their first and second stops to make binary search possible. Structures are connected together using pointers where possible, or ids where the size of the pointers would cause too much overhead. This memory structure can be queried easily, because every data can be accessed by following pointers or by finding an element in the appropriate array. The queries are also efficient, because the number of condition checks are minimized and no conversion is needed between data types.

5.2.2 Memory Snapshot Because this data structure is designed to be processed in-memory, it needs to be serialized on server side, and then restored later on the clients. All data is serialized to a single file. Figure 2 shows how the data is laid out in the memory. This data file starts with a header that contains information about the file, e.g. version numbers, size, checksum, preferred loading address, etc. The header is followed by the representation of the Database structure. Because we are using pointers, all other data can appear anywhere in the file, even overlapping structures and arrays are allowed. The database does not need to be modified after it is generated, so this is not causing any problems or inconveniences. The current implementation traverses the database starting from the Database structure, and serializes all other structures as it encounters them. Strings and arrays of primitive types are collected and checked for duplicates (by also taking partial matches into account) and written to the end of the file. The final step is to adjust pointers to match the new locations. The pointers in the file are relative not to the beginning of the file, but to a predefined address, which is stored in the header. Additionally, since the position of elements (except for the header) are not predefined, the database can contain arbitrary data. This makes storing different version of some structures in the same file possible, thus making migration between application versions easier. Loading the database is relatively straightforward and can be implemented efficiently: 1. On 32 bit architectures (currently the most common case for mobile devices), if there is room for 35

JOURNAL OF COMPUTER SCIENCE AND SOFTWARE APPLICATION

Table 1. Size of Data Structures of Some Cities Budapest (Hungary)

Perth (Aus- Auckland tralia) (New Zealand)

Madrid (Spain)

Edmonton (Canada)

13.1 MB4

19.6 MB5 219 MB

1.

GTFS ZIP

15.2 MB1

26.6 MB2

4.39 MB3

2.

GTFS extracted

111 MB

124 MB

36.9 MB

94.7 MB

3.

Object-relational 29.2 MB

77.5 MB

50.1 MB

15.4 MB

96.0 MB

4.

Binary

1.27 MB

3.31 MB

1.29 MB

1.31 MB

1.52 MB

5.

BinaryGZIP

0.64 MB

1.35 MB

0.36 MB

0.75 MB

0.32 MB

6.

Validity period

2014-06262014-0801

2014-06142014-0831

2014-06212014-0725

2014-06032015-1231

2014-06172014-0830

7.

Number of routes 329

637

1388

205

604

8.

Number of stops

5350

13263

6092

4628

6418

9.

Number of trips

115745

36390

15345

75771

57210

10.

Population6

1.74 million

1.97 million

1.42 million

3.22 Million 0.81 million

the database at the preferred address, the whole file needs to be loaded to that address. No other operation is needed, the data is usable as is. In memory constrained environments, or if the database file covers a large area, memory mapping can be used instead of direct file I/O. 2. On 32 bit architectures, if the proper memory space is occupied by something else, the database needs to be loaded to a different (arbitrary) address, and the pointers needs to be adjusted. This can be done by recursively walking down from the root, while ensuring that the same pointer gets modified only once (this is needed because the database contains loops). 3. On 64 bit architectures, since the pointer size on the system does not match the pointer size used in the file, the references cannot be used directly. Our implementation uses a wrapper class to solve that problem, which treats all references as relative pointers to the preferred address.

6. CONCLUSION In the Table 1 we show the size of the GTFS feed and our data structure for some cities. These measurements were taken with the latest available GTFS feed for each city on 27 June 2014. The first row of the table shows the size of the GTFS ZIP file, which can be downloaded from the proper website. This file needs to be extracted in order to work with it, and the second row shows the size of the extracted files. The third row represents the size of the database based on our logical data structure, whereas the fourth row shows the size of the data file created using our C structure with snapshot, which is used by the mobile applications. The applications download a compressed version of this file, which is represented in the fifth row of the table. The population and the number of routes, stops, and trips are included in order to make the size of the cities comparable. Sources: 1: http://www.bkk.hu/gtfs/budapest gtfs.zip 2:http://www.transperth.wa.gov.au/TimetablePDFs/GoogleTransit/google transit.zip 3: https://at.govt.nz/bus-train-ferry/more-services/google-transit-feed/ 36

Data Structure to Store GTFS Data Efficiently on Mobile Devices

4: https://servicios.emtmadrid.es:8443/gtfs/transitemt.zip 5: http://webdocs.edmonton.ca/transit/etsdatafeed/google transit.zip 6: Based on Wikipedia You can see that the binary file created using our data structure is one order of magnitude smaller than the original GTFS feed, so it can be stored on a mobile device without problems. The compressed version is even smaller, which helps users to save bandwidth and allows us to publish frequent updates. The data structure introduced in this article is realized in our transit schedule application [11], which can be downloaded from Google Play. The name of the application is “Budapesti Menetrend”.

Appendix struct Database{ (...) date valid from; date valid until; array calendar overrides; array calendar school holidays; array day types; array route categories; array routes; array stops; array stops groups; array shapes; array transfers; }; struct CalendarOverrideEntry{ uint id; date from date; date to date; uint new type; cstring description; }; struct CalendarSchoolHolidayEntry{ uint id; 37

JOURNAL OF COMPUTER SCIENCE AND SOFTWARE APPLICATION

date from date; date to date; cstring description; }; struct DayTypeEntry{ uint id; cstring name; array mask; }; struct RouteCategoryEntry { uint id; cstring name; uint color; uint flags; array routes; }; struct RouteEntry{ uint id; cstring name; cstring long name; cstring description; uint color; ref category; array lines; }; struct RouteLineEntry{ uint id; ref route; uint direction; cstring name; uint flags; array headsigns; array stops; 38

Data Structure to Store GTFS Data Efficiently on Mobile Devices

array travel times; array departure groups; }; struct LineHeadsignEntry{ uint stop index; cstring name; }; struct LineStopEntry{ uint stop id; uint distance traveled; }; struct LineTravelTimeEntry{ vararray arrival times; vararray departure times; }; struct LineDepartureGroupEntry{ uint day type id; uint time id; uint flags; array time offsets; }; struct LocationEntry{ int latitude; int longitude; }; struct StopEntry{ uint id; uint group id; cstring name; cstring subname; cstring street; LocationEntry location; uint orientation; 39

JOURNAL OF COMPUTER SCIENCE AND SOFTWARE APPLICATION

uint flags; }; struct StopGroupEntry{ uint id; cstring name; LocationEntry location; array stops; }; struct ShapeEntry{ uint first stop id; uint next stop id; uint point count; vararray packed points; }; struct TransferEntry{ uint first stop id; uint next stop id; uint transfer time; };

ACKNOWLEDGEMENTS ´ The publication was supported by the TAMOP-4.2.2.C-11/1/KONV-2012-0001 project. The project has been supported by the European Union, co-financed by the European Social Fund.

References [1] G. T. F. Specification, 2012. https://developers.google.com/transit/gtfs/. [2] TRANSMODEL, 2001. http://www.transmodel.org/. ¨ ¨ OSS ¨ EGI ¨ ´ ´ [3] “A HELYKOZI KOZ KOZLEKED ES Transmodel szabv´any´u menetrendi e´ s h´al´ozati adatokat tartalmaz´o adatb´azis k´aszit´es e´ s min¨ost´es felt´etel-rendszere,” http://www.kti.hu/uploads/KMK/2011/Transmodel%20Tud%C3%A1st% C3%A1r/KMK%20Transmodel%20szolg%C3%A1ltat%C3%A1s/Transmodel_ kovetelmenyek110422.pdf. [4] TransXChange, 2014. https://www.gov.uk/government/collections/ transxchange. 40

Data Structure to Store GTFS Data Efficiently on Mobile Devices

[5] A. Antrim, S. J. Barbeau, et al., “THE MANY USES OF GTFS DATA–OPENING THE DOOR TO TRANSIT AND MULTIMODAL APPLICATIONS,” Location-Aware Information Systems Laboratory at the University of South Florida, 2013. [6] B. Ferris, K. Watkins, and A. Borning, “OneBusAway: A Transit Traveler Information System,” in Mobile Computing, Applications, and Services, pp. 92–106, Springer, 2010. [7] S. J. Barbeau, P. L. Winters, and N. L. Georggi, “Travel Assistant Device to help transit riders,” Center for Urban Transportation Research, 2010. [8] M. Braga, M. Y. Santos, and A. Moreira, “Integrating Public Transportation Data: Creation and Editing of GTFS Data,” in New Perspectives in Information Systems and Technologies, Volume 2, pp. 53–62, Springer, 2014. [9] D. G. Gerstle, Understanding bus travel time variation using AVL data. PhD thesis, Massachusetts Institute of Technology, 2012. [10] C. Bick and R. Damphouse, “GTFS SQL Import Tool,” 2010. http://cbick.github.io/ gtfs_SQL_importer/html/index.html. [11] T. Szincsak and A. Vagner, “Public transit schedule and route planner application for mobile devices,” 2014.

41

About This Journal CSSA is an open access journal published by Scientific Online Publishing. This journal focus on the following scopes (but not limited to): Automated Software Design and Synthesis

Software Architecture

Automated Software Specification

Software Design Methods

Component-Based Software Engineering

Software Domain Modeling and

Computer-Supported Cooperative Work

Meta-Modeling

Knowledge Acquisition

Software Engineering Decision Support

Object-Oriented Technology

Software Maintenance and Evolution

Patterns and Frameworks

Software Process Modeling

Process and Workflow Management

Software Quality

Programming Languages and Software

Software Reuse

Engineering

Software Testing

Reliability and Fault Tolerance

Mobile APP Design

Reverse Engineering Welcome to submit your original manuscripts to us. For more information, please visit our website: http://www.scipublish.com/journals/CSSA/ You can click the bellows to follow us: Facebook: Twitter:

https://www.facebook.com/scipublish

https://twitter.com/scionlinepub

LinkedIn: https://www.linkedin.com/company/scientific-online-publishing-usa Google+:

https://google.com/+ScipublishSOP

SOP welcomes authors to contribute their research outcomes under the following rules: Although glad to publish all original and new research achievements, SOP can’t bear any misbehavior: plagiarism, forgery or manipulation of experimental data. As an international publisher, SOP highly values different cultures and adopts cautious attitude towards religion, politics, race, war and ethics. SOP helps to propagate scientific results but shares no responsibility of any legal risks or harmful effects caused by article along with the authors. SOP maintains the strictest peer review, but holds a neutral attitude for all the published articles. SOP is an open platform, waiting for senior experts serving on the editorial boards to advance the progress of research together.