Data warehouse for soybeans and corn market on Brazil

Data warehouse for soybeans and corn market on Brazil F. Elias Correa1, P.L. Pizzigatti Corrêa1, J.R. Almeida Junior1, L.R. Aparecido Alves2 and A. Ma...
3 downloads 1 Views 914KB Size
Data warehouse for soybeans and corn market on Brazil F. Elias Correa1, P.L. Pizzigatti Corrêa1, J.R. Almeida Junior1, L.R. Aparecido Alves2 and A. Mauro Saraiva1 1 Agricultural Automation Laboratory, Department of Computer Engineering, Polytechnic School, University of São Paulo, São Paulo, SP, P.O. Box 61548, 05424-970, Brazil; [email protected]; [email protected]; [email protected]; [email protected] 2 Center for Advanced Studies on Applied Economics, School of Agriculture‚ Luiz de Queiroz’, University of São Paulo, Piracicaba, SP, P.O. Box 132, 13400-970, Brazil; [email protected] Abstract Brazilian agribusiness generates a large amount of trade information considering all production chains. The volume of money involved in it worldwide is huge. However there are few solutions for management information systems that support decision making based on the operational data generated by each chain, it means, systems that provide consistent information toward agribusiness. This is important not only to the farmers, but also to the government and companies that need parameters about agribusiness. So, this study aimed are modeling and create a data warehouse based on grain market in Brazil, using information about prices and regions of soybeans and corn. This study is based on history database on agricultural market for up to ten years. The proposed paper uses methods based on dimensional modeling to model the agricultural market, bus matrix and open source data warehouse tools. This data warehouse allows researchers to navigate using a web browser through historical data and decision-making information in a friendly way. Also, the model can be used in future works of data warehouse to increase amount and quality of information for agriculture. Keywords: dimensional model, agricultural market, decision-making, grain market Introduction The availability of consistent agricultural information is essential in order to have success in business, adapting to the new market trends. Despite generating a large volume of financial data, data handled by these systems are not easily accessible or are managed in an inappropriate manner. Therefore, this system is prepared to provide support to decision-making or to help to define policies for agriculture issues. Some institutions, such as the Center for Advanced Studies in Applied Economics - CEPEA (CEPEA, 2008) and the Brazilian Institute of Geography and Statistics - IBGE (IBGE, 2008), work to reduce the shortage of information. They work on management, storage, processing and creating databases for agribusiness. The importance of using a database system to support the decision-making has been highlighted in some research with focus on agriculture, such as Toni (2000) that shows the use of Data Marts in agricultural cooperatives. However, there are few initiatives to implement decision-support tools specifically to agribusiness, justifying the development of this work, showing the benefits of applying these tools and sharing experiences in modeling data systems to support decision in agribusiness.

EFITA conference ’09

675

There are few works focused on agribusiness, specially in the Brazilian market, and consequently not allowing the creation of a system that helps the decision-making process based on historical databases. Thus, the work here presented is part of a research that applies data warehouse techniques on databases of corns and soybeans markets on regions of Brazil. Data warehouse tools and models Data warehouse is an information system solution that assists and optimizes the queries and thus creates an environment for analysis of historical data, using techniques known as OLAP (Online Analytical Processing) systems (Schulze, 2007). The sophistication of analytical tools known as OLAP systems provide functionality to meet the queries for analysis of historical data of almost every kind of application, which differentiates OLAP systems from operational data kept in relational database (Elmasri et al., 2000). The data used by OLAP systems is stored in a special database called data warehouse. The implementation of the data warehouse is divided into 4 parts: design, build, access and management. (Kimball and Ross, 2002). According Inmon and Hackathorn (1999), ‘Data Warehouse includes a set of consolidated data, organized by subject and that evolve over time, used to support decision-making process.’ The aim is to create an environment to analyze grain trade prices, using data warehouse concepts described in the next sections. Dimensional modeling The dimensional modeling is characterized by presenting data in an environment of an intuitive and standardized way and allowing access to a high-performance system (Kimball and Ross, 2002). The standard schema for dimensional modeling is called Star Schema. The schema is divided into two main structures, the numeric data of business management is stored in a unique table or fact table, and the descriptions of the attributes are stored in dimension tables (Inmon, 1999), as presented in Figure 1.

Figure 1. Dimensional modeling of data warehouse. Figure 1: Dimensional Modeling of Data Warehouse.

676 2.2. Bus matrix

EFITA conference ’09

The bus matrix can be used to describe the dimensional modeling using a matrix

Bus matrix The bus matrix can be used to describe the dimensional modeling using a matrix structure. It shows the application domain used in the data warehouse in a intuitive way. The lines of the matrix are the business areas, representing the fact tables of the dimensional modeling and the columns represent the dimension tables, which are descriptive information toward the business. The cells of the matrix are marked if they contain common events into the business area and their descriptive (Kimball and Ross, 2002). In conclusion, bus matrix is flexible to include new dimensions and fact tables, maintaining the soft representation. Data warehouse tools The tools used in this paper are those defined as OLAP, typically used in for Data Warehouse systems (Nilakanta, 2008). The selected for this research followed the criterion of being open source. There are other implementations with proprietary tools, using by database copyright, such as Oracle or Microsoft tools. However, the open source tools used in the research were stable, portable and low-cost, which justifies their use in the implementation. The tools used are presented in the list below: • Apache-Tomcat (Apache, 2008): Web pages server software. It works together with Tomcat server, which is used to work with pages based on the Java language. • Mondrian (Mondrian, 2008): This software is responsible for applying the data warehouse concepts, allowing the construction of information cubes, using graphic and friendly interface. In addition, it is possible to create graphics and ad hoc queries, for instance. • Mysql-Server (Mysql, 2008): Database Management System responsible for storing the data warehouse tables. Results and dimensional modeling applied on grain market This section presents the application of dimensional modeling and bus matrix to define the architecture of data warehouse used on OLAP application for grain market. Star Schema and bus matrix for grain chains The star model was created in order to make the analysis of grains prices (soybeans and corn) in different Brazilian regions, containing data that define where the prices are collected. The data have their relevance to the market and are represented in a relation called Dimension_Region. The relation Dimension_Product contains a description of the grains products and the Dimension_Index has information toward the representative price of the product that is the bases for financial market. Other dimensions described are dates, units and taxes used. The fact table is named Fact_grain market and has data toward average, maximum and minimum prices, using term values and taxes. The fact table and dimensions tables are presented in Figure 2. As explained, it is possible to represent the Star Schema model by a bus matrix, which facilitates viewing the areas of the business organization. Table 1 shows the bus matrix for grain chain. Implementation of data warehouse Figure 3 shows the process for implementing the data warehouse, where the operational databases has diary data toward the grain market, taxes, index and other, then these data are collect by ETL (Extract, Transformation and Load) process.

EFITA conference ’09

677

product that is the bases for financial market. Other dimensions described are dates, units and taxes used. The fact table is named Fact_grain market and has data toward average, maximum and minimum prices, using term values and taxes. The fact table and dimensions tables are presented in Figure 2.

Figure 2. Dimension modeling for grains. Figure 2: Dimension modeling for grains

3.2. As Implementation data warehouse explained, it isof possible to represent the Star Schema model by a bus matrix, which facilitates viewing the areas the business organization. Table 1 shows the bus matrix for Table 1. Bus matrix for grainofmarket. On figure 3 shows the process for implementing the Data Warehouse, where the grain chain. operational databases has data the grain market, taxes, index and other, then Table 1: shares Busdiary matrix fortoward grain Index market. Data marts/dimension Dates prices Regions these data are collect by ETL (Extract, Transformation and Load) process. Products Taxes In this process, data are organized by date, types and applied consistent checks. After

Products

X X

X X

Taxes

Regions

Index Prices

Dates

Corn X X the data are load to data warehouse. Finally,X it is used OLAP,Xthat meansXonline analytical processing to access and navigateXon the information stored. Soybeans X X

The OLAP tool called Mondrian was used. Other software tools used were: Mysql data base, Web server Apache / Tomcat, installed on a server with Linux operating system.

Data Marts/ Dimension Shares Corn Soybeans

X X

X

X

Figure 3: The structure of Data Warehouse. Figure 3: The structure of data warehouse.

The database that maintains data from Data Warehouse manager uses the MySQL database management system. The process ETL obtains, from the operational databases, the data to be inserted in the data warehouse, by means of queries in SQL language. In this process, data are organized by date, types and applied consistent checks. After the data are Then scripts were created in the programming language Perl language in order to load to data warehouse. Finally, it is used OLAP, thatvalues means online analytical verify certain consistency parameters, such as date, and duplicity of files.processing to access and navigate on the information stored. In addition, there are other Perl scripts applied in the operational databases, providing The toolincalled Mondrian was used. Other software used were: Mysql data base, Web theOLAP insertion, the fact and dimension tables, of all the datatools useful to the Data Warehouse.

server Apache / Tomcat, installed on a server with Linux operating system.

3.3. Using data from data warehouse for analysis of grain market

The Mondrian tool was used in this work to create the environment for analysis the historical database of grain market that will be present below. The Figure 4 shows the navigation and extraction of queries provided by the software on the Data Warehouse. 678 conference It is possible to see on the figure 4 prices of some products,EFITA like prices of corn ’09 growerlevel, and tree region were the prices were collect, like Rio Verde city located in Goias states in Brazil.

The database that maintains data from data warehouse manager uses the MySQL database management system. The process ETL obtains, from the operational databases, the data to be inserted in the data warehouse, by means of queries in SQL language. Then scripts were created in the programming language Perl language in order to verify certain consistency parameters, such as date, values and duplicity of files. In addition, there are other Perl scripts applied in the operational databases, providing the insertion, in the fact and dimension tables, of all the data useful to the data warehouse. Using data from data warehouse for analysis of grain market The Mondrian tool was used in this work to create the environment for analysis the historical database of grain market that will be present below. The Figure 4 shows the navigation and extraction of queries provided by the software on the data warehouse. It is possible to see on the Figure 4 prices of some products, like prices of corn growerlevel, and tree region were the prices were collect, like Rio Verde city located in Goias states in Brazil. Based on the historical information of the grain market provided by data warehouse, it was observed that the production of grains in Brazil has at least two characteristics. First there is the dispersion of production, especially for the regions South and the Midwest, where the period of planting and harvesting is different, and the moment of decision-making by the producer. Second, that in

Figure 4: Mondrian interface applied soybeans corn data. Figure 4. Mondrian interface applied on on soybeans andand corn data. Based on the historical information of the grain market provided by data warehouse, it was observed that the production of grains in Brazil has at least two characteristics. First there EFITA conference ’09 679 is the dispersion of production, especially for the regions South and the Midwest, where the period of planting and harvesting is different, and the moment of decision-making by the producer. Second, that in Brazil there are at least two distinct seasons, summer and winter,

Brazil there are at least two distinct seasons, summer and winter, when the crops compete in an area between them. In summer, the main competition is between soybeans and corn. The choice of the producer from another culture depends, among other factors, the relationship between the prices of products. If the price of soybeans are more attractive, it chooses for its planting, otherwise the corn. In Figure 5 this relationship is observed. In 2007, the ratio was almost 2 below, with the lowest values being observed at the beginning and end of the year. this ratio indicated the attractiveness of planting corn at the expense of soybean, which resulted in significant growth of maize production in the 2007/08 season in Brazil. The following year, however, the price of maize decreased and increased the soybean, making the relationship grow, showing attraction to soybeans. As a result, the 2008/09 season, the area planted with maize in Brazil and decreased soybean increased. Conclusion The use of Data Warehouse in the decision support systems for agriculture has a large volume of financial information that is critical for decision-making and analysis. Navigation and crosses provided by the data warehouse system are flexible and easy, allowing to obtain dynamic data from historical series of grain marketing prices. This work defined a model and implemented a Data Warehouse in a systematic way. It is not yet a complete implementation, because several points are still being defined and improved. This model aims at modeling grain’s market dimension and facts This model is intended to be extended to other agricultural trade chains, creating other Data warehouses (or data marts) to add to the already existing one(s). Other important issues to be studied and implemented in the next stages of this work are related to the use of Data Mining, ETL techniques through Talend software and security policies for access.

Figure 5:about Graphic about relation Figure 5. Graphic relation prices. prices.

4. Conclusion The use of Data Warehouse in the decision support systems for agriculture has a large volume of financial information that is critical for decision-making and analysis. 680 EFITA conference ’09 Navigation and crosses provided by the Data Warehouse system are flexible and easy, allowing to obtain dynamic data from historical series of grain marketing prices. This work defined a model and implemented a Data Warehouse in a systematic way. It

References Apache tomcat. Software for Server web site. Available at: http://tomcat.apache.org/. Accessed 10 july 2008. Cepea - Center for Advanced Studies on Applied Economics. Available at: http://www.cepea.esalq.usp.br. Accessed 22 October 2008. Elmasri, R. Navathe, S. B. Fundamentals of Database Systems. 3ª Ed. Menlo Park: Addison-Wesley, 2000. P. 841-872. IBGE – Brazilian Institute of Geography and Statistic. Available at: http://www.ibge.gov.br/home/. Accessed 20 May de 2008. Inmon, W. H., Management Data Warehouse, Makron Books, 1999. Kimball, R.;Ross, M. The Data Warehouse Toolkit. 2° Ed. Rio de Janeiro: Publisher: Campus, 2002. 494p. Mondrian – OLAP tool for data warehouse. Available at: http://mondrian.pentaho.org. Accessed July, 12, 2007. MYSQL. Software MYSQL for databases. Available at: http://www.mysql.com. Accessed 14 July 2007. Nilakanta S., Scheibe K., Rai A., Dimensional issues in agricultural data warehouse designs, Computers and Electronics in Agriculture. Volume 60, Issue 2, pages. 263-278., March 2008. Schulze C., Spilke J., Lehner W., Data modeling for Precision Dairy Farming within the competitive field of operational and analytical tasks, Computers and Electronics in Agriculture. Volume 59, Issues 1-2, pages. 39-55, November 2007. TALEND. Web site. Software TALEND. Available at http://www. http://www.talend.com/ >. Acessed Agust. 01, 2007. Toni, J. A. Data Marts in agricultural cooperatives. Master Degree. Universidade Federal de Santa Catarina, Florianópolis. 172p. 2000. Vassiliadis, P., Simitsis, A., and Skiadopoulos, S. Conceptual modeling for ETL processes. ACM, August 2002.

EFITA conference ’09

681

Suggest Documents