INTERFACE OF THE SAS SYSTEM BY YAFOOL

INTERFACE OF THE SAS SYSTEM BY YAFOOL Hedia Mhiri Sellami - Centre de Calcul El Khawarizmi 364 1 introductign The· users of. statistical a...
Author: Rose Flynn
2 downloads 0 Views 866KB Size
INTERFACE

OF

THE

SAS

SYSTEM

BY

YAFOOL

Hedia Mhiri Sellami - Centre de Calcul El Khawarizmi

364

1

introductign

The· users of. statistical and data analysis packages belong to different fields, such as economy, medicine, etc ••. With this diversity of f~elds there is a diversity of background in computing and statistics. We distinguish among users those who are familiar wi th data analysis and manipulation of its packages, and those who are not. All of these users are faced with the following problems: * the difficulty of choosing the data analysis methods which solve their problems * the difficulty of expressing their requests in terms of the syntax of the selected package[1] * interpreting the results given by the package. These difficulties lead us to consider an interface between the user and the data analysis package [1]. This paper proposes an interface between the data analysis package SAS* [2,3] and its end users via an object oriented language YAFOOL** [4]. We present the goals of the interface and justify the object oriented approach. We also present the structure of the objects that we need for the description of the interface. We explain how the interface executes user's requests, and present the code generation and the interpretation of the results. 2

The

concepts

gf

data

analysis

The data analysis is a set of analytic methods which goal is to describe data tables, to determine the relationship between observations and variables and to sum up the table's informations. An analytic method corresponds to a determined data analysis's theory, with its acceptable data tables' types and acceptable variables' types. Besides this, each analytic method has a specific set of results. Some of these analytic methods have points in commun such as sharing basic concepts or giving same results, so these methods are classified into groups called * SAS is a regisetered trademarks of SAS Institut ** YAFOOL (Yet Another Frame Object Oriented Language) product of SEMA-GROUP

365

is a

analytic groups, such as the FACTORIAL group composed of the principal component analysis and the correspondance analysis. Depending on the data analysis package, an analytic method can be illustrated by one or more procedures of the package. 3

The

goals

of

the

interface

When using a data analysis package some users know the results they wish to extract from their data or the procedures they want to use, while others ignore both of them. Depending on the user's knowledge and data, the interface helps him solving his problem. 3-1 Guiding the user in choosing methods. procedures and results Users have d~fficulties in choosing the analytic method which solves his problem and even in selecting the appropriate procedure and its parameters. The interface has to orient the user to the analytic group, the analytic method and the procedures that are compatible with his data types and give the desired results. So, if the user has a contingency table the interface will proposes to treat it by the correspondance analysis and precisely by the CORRESP procedure. 3-2 Generation of the SAS code Once the results and the procedures are chosen, the interface can generate the code of the SAS program which responds to the user's desires, since we suppose he has described his data structure (types, names, column ... ). The information about data structure is also used to check the compatibility between data types and procedures, and between data types and analytic methods. 3-3 Interpretation of the results After generating the SAS code corresponding to the user's choices, the interface submits it to SAS to compute the results

366

i

1 )

!

and to give significance of interpretable results. These significances must be given according to the interpretation's rules relevant to each procedure. 4

The

object

oriented

language

Choice

Object oriented design is essentially based on the concept of classes, the concept of inheritance and the concept of methods and message passing. These concepts are well adapted to illustrate the user's problems when using data analysis. 4-1 The concepts of data analysis represented by classes The concepts of analytic method and group are appropriately represented by the class, so we consider a class classe-oranalytique to represent the analytic group and its instances correspond to analytic methods, as they can be identified by approximately the same attributes than the group to which these analytic methods are attached. We also distinguish a class, classe-procedure, representing procedures which is a sub-class of classe-gr-analytique. Besides the attributes of the classe-granalytique, the procedure's class needs attributes to identify its statements and options which are also grouped into a class classe-osres. The data analysis operates on tables which are represented by the class classe-tableau. 4-2 The concepts of data analysis represented by hierarchy An analytic method has generally the same conditions of applicability and the same goals than its group, and as inheritance factorizes informations, the commun constraints are only specified at the group level [7]. The descendants of an analytic group inherit its caracteristics except some attributes of the analytic method which are more precise then those of the group. As an exemple the factorial group manipulates every table'S type, but one of its analytic method, the corresponding analysis, accepts only contingency tables. The hierarchy also illustrates the relationship between an analytic method and the procedures representing it.

367

4-3 The concepts of data analysis represented by methods and messages The data analysis package's procedures present their results in different forms, so we need a specific treatment for each procedure to read and collect its results. This treatment can be appropriately represented by the concept of method-attribute offered by the object oriented language. We associate to each procedure a method-attribute which goal is to collect the procedure's results. The interpretation of results also depends on the procedure, we consider another method-attribute attached to each procedure to interpret the results. We can also mention the fact that the interpretation of some results is independant of the procedures generating them such us the CHI2 or the correlation coefficient. As these results correspond to SAS statements or options we associate a method-attribut to the class classe-osres to interpret a result's value. This attribute is activated by a message from the procedure generating it. 5

Hgw

the

interface

wgrks

and

the

ob; ects

structure

5-1 How the interface works The interface establishes a dialogue with the user. At the begining he should communicate his data structure (variable's name, type, column). The interface will display the types of tables it can accept. Depending on the selected table's type, the interface presents the associated variable'S types. So if the user has a contingency table, the user has to indicate his variables types among mesurables or frequency types. After describing the data, the user presents his problem. He has to choose between the dialogue proposed to the profane and the one proposed to the expert. When choosing the expert's dialogue the interface presents the available analytic groups followed by the analytic methods descending from the selected group. Once an analytic method is selected, the interface displays the procedures linked to it, and the user has to choose. Once a procedure is selected the interface gives the lists of results, options and statements associated to the choice, then the 368

interface generates the SAS code coresponding to these choices and submits it to SAS. Finally the interface collects the results and begins interpreting them. If the user chooses the profane's dialogue the interface displays the list of procedures that can treat its data, so that he can consult the list of results corresponding to each procedure. Once a procedure is selected the dialogue becomes identical to the one proposed to the expert. Each of these stages has a specific help, so for each procedure the user can have more informations, a table's type can be ambiguous to the user, he can use the help associated to each type etc . . . . 5-2 The structure of the objects To construct the interface we define objects that represent the different "operators" on the user's problem and those on the data analysis knowledge. In fact the concepts of the data analysis must be represented in an exploitable form so that the interface can inform the user about the condition of applicability of a procedure, or about the conclusion related to some results, etc The set of classes we distingu~sh represents the analytic groups, the SAS's procedures, the data tables, the SAS's statements, the user's data and the user's problem. 5-2-1 The class representing the

use~'s

data

The class we consider to represent the user's data is called classe-donnees. It is composed of attributes representing the user's data type, his variables' types, columns and names. 5-2-2 The class representing the user's problem The user's problem or situation is represented by the class classe-situation. Its attributes contain the name of the table the user need to manipulate, the selected group, analytic method and procedure. Some method-attributes are attached to this class, one called choixqroupe which is invoked to present the analytic groups compatible with the user's data among which he can choose. An other method-attribute is called choixmethode, it presents 369

I

the analytic methods descendant from the selected group and compatible with the user's data type. The attribute choixprocedure is activated after the attribut choixmethode to present the procedures corresponding to the analytic method the user selects. Once the choice is done, three others methodattributes are sequentialy invoked to present the procedure's results, the procedure's options and the procedure's statements. 5-2-3 The class representing the tables The data analysis manipultes different table's types. The class classe-tableau we consider, represents the general structure of a table and its attributes identify the accepted variables' types. Tl1.e example below shows the YAFOOL's structure of the table's type ind*var which crosses observations and variables:

herit (classe-tableau objet-ideal) est-un ((#:yafool:clamp ind*ind ind*var contingence) (value classe-tableau) ) instance t (ind*var (tab-groupe (value factoriel acp princomp afc corresp descriptif moyenne means stat-de-tableau tabulate histogramme chart)) (tab-var (value (qualnom) (qualord)

(quantmes)

(quantcom)

(quantbin)

)

(tab-lib (value individus*variables)) (tab-typ (value. ind*var))

5-2-4 The class representing the analytic groups The attributes of class classe-gr-analytique we consider identify the analytic group ,the analytic method associated to it, the table's types it accepts, the varaible's types it can manipulate and the results it can generate. One method-attribute 370

is attached to this class to help the user understanding the significance of its goals. The instances of this class correspond to analytic methods. 5-2-5 The class representing SAS's ptocedures Each SAS's procedure is an instance of the class classeprocedure. The attributes of this class identify the procedure's name, its options, its results and its statements. We associate a method-attribute to this class to interpret the results. The example below shows the structure of the procedure PRINCOMP: herit (acp factoriel gr-analytique classe-gr-analytique objetideal) est-un (value acp)) instance t (princomp (gr-res (value mean std n corr eigenval )) (gr-but (value analyse en composantes principales normees) ) (gr-typ-var (value quantmes )) (gr-typ-tab (value ind*var)) (gr-nom (value. princomp)) (proc-interp (value princomp) (proc-statement (value by freq partial var weight)) (proc-option (value data out outstat cov noint prefix vardef df wgt wdf noprint) (proc-res-opt (value ustd ucorr ucov score uscore))

5-2-6 The class representing the SAS statements We call this class classe-osres as it represents the Options, the Statements and the RESults. Each instance of this class is linked to the analytic groups, to the analytic methods and to the procedures to which it is attached . 6

Cgde

generation

and

result

6-1 code generation 371

interpretatign

After orienting the user to the procedure which solves his problem, the i~terface can access the procedure's description, which is used to write a code according to SAS's syntax. In fact, by using the object representing data, the interface can generate the SAS code relevant to the data step. With the object representing the user's desires and the one representing the selected procedure, the interface generates the proc step of the SAS program. Once the program written, the interface submits it to SAS and recuperates the results to present a tentative first interpretation. 6-2 Interpreting the results The last step of each data analysis is the interpretation of results. We distinguish two levels in the interpretation, the first is tightly related to the procedure and the second has a significance which is the same for all. Many SAS statements correspond to results, and some of them have an interpretation which is independant of the procedure to which they are attached, so we associate to such statement a method-attribute which interprets the value of the result such as the CHI2. In fact interpreting the value of CHI2 consists on comparing the calculated value to the chi-square's table, to conclude if there is a correlation between the two variables. For such results we associate a method-attribute (osres-interp) to perform the interpretation. The procedure generating this kind of results sends a message to the object representing it to invoke this method. The step of interpretation begins by sending a message to the selected procedure to activate its interpretation's method. This method will send messages to some results to invoke their own interpretation. To illustrate this we can take the CORRESP procedure which interpretation consists on selecting percentages of inertias and when reading the value of the total chi-squares it sends a message to the instance of classe-osres having the same name to activate its interpretation.

372

7

Cgnglusion

This- work has provided us with some evidence to the effect that an object orienteQ programming approach is well adapted to representing the data analysis field via classes and hierarchies. This interface can manipulate all SAS's procedures and not only those of the data analysis, but it's important to verify that the field represented by them can be represented by classes and hierarchy. As all procedures obey to the same concepts, the only constraint depends on the scientific domain they represent. We can also notice that this interface can be adapted to any data analysis package as they are composed of procedures.

373

REFERENCES (1)

(2) (3) (4) (5) (6) (7)

Les Methodes exploratoires de l'Analyse des Donnees. Mohamed El-Hedi Zaiern 1988 SAS User's Guide: BASICS Version 6 Edition. SAS Institute INC, 1990 SAS User's Guide:STATISTICS Version 6 Edition. SAS Institute INC, 1990 Yet Another Frame-based Object Oriented Language Version 3.22. Manuel de reference. SEMA GROUP, 1990 LE-LISP; version 15.21 Le Manuel de reference. INRIA 1987 Elements d'Analyse de Donnees E. Diday, J. Lemaire, J.Pouget, F. Testu. Dunod 1982 Les langages a objets. Masini G. , Napoli A. , Colnet D. Leonard D. et Tornbre K. InterEditions (1989).

374

Suggest Documents