SOFTWARE
This note briefly describes main programs of the software package, called Rough Family, i.e. ROSE and ProFIT. They are interactive software systems designed for data analysis and knowledge discovery using the rough set approach. The Rough Family is a set of programs which are implementations of basic functions of the rough set approach and rule discovery techniques. These programs have been developed in the Institute of Computing Science, Poznañ University of Technology under the supervision of Roman S³owiñski and Jerzy Stefanowski. The main collaborators directly involved in the process of designing and programming are Robert Mieñko, Bart³omiej Prêdki and Robert Susmaga. Currently, the package has two following main components: ROSE and ProFIT. The RoughDAS program is historically one the first successful implementations of the rough set methodology. According to the literature, it is the rough set based software the most often used in real life applications. The RoughClass is an interactive system supporting classification of new coming objects based on decision rules discovered from examples. The ProFIT program is an implementation of the generalized rough set model that handles uncertain input data resulting from imprecise or inexact attribute values, missing values, or the attributes given in the form of real numbers and fuzzy linguistic qualifiers. The aim of the Rough Family software is to enable the rough set based knowledge discovery process, i.e.: performing a rough set based analysis of the data (in particular, calculating approximations of decision classes, checking dependencies between attributes, looking for reduced subsets of attributes), extracting characteristic patterns from data, inducing decision rules from sets of learning examples, evaluating the discovered rules by means of different, validations techniques, constructing decision support systems based on knowledge represented in the form of decision rules. Both programs accept input data in a form of a table called an information system in which rows "correspond to objects (cases, observations, etc.) and the columns correspond to attributes (features, characteristics, etc.). The attributes are divided into disjoint sets of condition attributes (e.g. results of particular tests or experiments) and decision attributes (expressing the partition of objects into decisions, i.e. their classification). The input data can be either introduced using an internal edit option or imported from text files. The input data files to all the programs are compatible with basic file formats used in the ROSE system and also with old formats introduced in the RoughDAS system, thanks to which the communication between the programs is possible. Although ROSE and ProFIT systems have been created for MS-Windows environment running on PC compatible machines, their main computational modules (without GUI part) being implementations of the rough set approach are also available in versions running under Unix operating systems, e.g. on workstations or supercomputers. The program ROSE- R0ugh Set Data Explorer is an interactive software system running under 32 bit GUI operating systems (Windows 95/NT 4.0) on PC compatible machines. The input data to the ROSE program is the information system/table which can be defined either by using an internal editor or can be imported from a file. The data are stored in a text file according to special syntax that, besides the description of objects by attributes, may contain additional information about the attributes, e.g. their type, definition of their domains, etc. The ROSE also accepts file formats coming from other systems, i.e. from its predecessor RoughDAS, input, decision table used in Grzymala's LERS system, and formats of files containing learning examples for well-known C4.5 machine learning system. Except, visualization in GUI, all results are also written to plain text files, so they are also readable outside the system, and can easily be converted to other required file formats. The ROSE offers currently the following functions:
Moreover, the ROSE has a modular software construction that allows for its development in future and easy adaptation to various user's requirements and specificity of the given applications. The program ProFIT - Rough Processing of Fuzzy Information Tables - is an implementation of the generalization of the rough set theory that handles uncertainty in the definition of the information system. Let us remind that in the standard rough set model it is assumed that each pair [object, attribute], must be defined in unique and precise way. In practice, however, these pairs may be neither unique nor precise, i.e. they can be uncertain. Here, the considered generalization allows to take into account the following situations: uncertain discretization of quantitative attributes, imprecise or inexact values of numerical attributes, multiple values possible for one pair [object, attribute] given, e.g. in a form of linguistic fuzzy qualifiers. A special way of modelling these three types of uncertainty uses fuzzy set theory, which boils them down to, so called, multiple fuzzy descriptors. Then, the generalization preserves all characteristic features of the rough set approach while enabling reasoning about uncertain data. Part of operations offered by the program is the same as those of ROSE, but instead of producing standard rough set results, the ProFIT generates results specific to the generalized rough set theory, e.g. generalized accuracies of approximations or the generalized fuzzy decision rules. The new features of the ProFIT program include, e.g.:
The programs of the Rough Family and their predecessors RoughDASaad Rough-Class have been applied in many fields, e.g. medicine, pharmacy, technical diagnostics, finance and management, image and signal analysis, etc. The references to these applications are given, e.g. in [4, 2, 12, 5]. For example, the considered software has been successfully applied to analyse:
|