Error message

  • Notice: Undefined variable: _SESSION in tracking_init() (line 27 of C:\xampp\htdocs\rsds\sites\all\modules\rsds\tracking\tracking.module).
  • Warning: file_get_contents(http://user-agent-string.info/rpc/get_data.php?key=free&format=ini&ver=y): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in UASparser->get_contents() (line 247 of C:\xampp\htdocs\rsds\sites\all\modules\rsds\tracking\UASparser\UASparser.php).
  • Warning: file_get_contents(http://user-agent-string.info/rpc/get_data.php?key=free&format=ini): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in UASparser->get_contents() (line 247 of C:\xampp\htdocs\rsds\sites\all\modules\rsds\tracking\UASparser\UASparser.php).

SOFTWARE

Browse software:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Rough Family
Institute of Computing Science, Poznan University of Technology, Pozna??±, POLAND


Abstract

This note briefly describes main programs of the software package, called Rough Family, i.e. ROSE and ProFIT. They are interactive software systems designed for data analysis and knowledge discovery using the rough set approach.

Introduction

The Rough Family is a set of programs which are implementations of basic functions of the rough set approach and rule discovery techniques. These programs have been developed in the Institute of Computing Science, Poznañ University of Technology under the supervision of Roman S³owiñski and Jerzy Stefanowski. The main collaborators directly involved in the process of designing and programming are Robert Mieñko, Bart³omiej Prêdki and Robert Susmaga.

Currently, the package has two following main components: ROSE and ProFIT. The RoughDAS program is historically one the first successful implementations of the rough set methodology. According to the literature, it is the rough set based software the most often used in real life applications. The RoughClass is an interactive system supporting classification of new coming objects based on decision rules discovered from examples. The ProFIT program is an implementation of the generalized rough set model that handles uncertain input data resulting from imprecise or inexact attribute values, missing values, or the attributes given in the form of real numbers and fuzzy linguistic qualifiers.

The aim of the Rough Family software is to enable the rough set based knowledge discovery process, i.e.: performing a rough set based analysis of the data (in particular, calculating approximations of decision classes, checking dependencies between attributes, looking for reduced subsets of attributes), extracting characteristic patterns from data, inducing decision rules from sets of learning examples, evaluating the discovered rules by means of different, validations techniques, constructing decision support systems based on knowledge represented in the form of decision rules.

Both programs accept input data in a form of a table called an information system in which rows "correspond to objects (cases, observations, etc.) and the columns correspond to attributes (features, characteristics, etc.). The attributes are divided into disjoint sets of condition attributes (e.g. results of particular tests or experiments) and decision attributes (expressing the partition of objects into decisions, i.e. their classification). The input data can be either introduced using an internal edit option or imported from text files. The input data files to all the programs are compatible with basic file formats used in the ROSE system and also with old formats introduced in the RoughDAS system, thanks to which the communication between the programs is possible.

Although ROSE and ProFIT systems have been created for MS-Windows environment running on PC compatible machines, their main computational modules (without GUI part) being implementations of the rough set approach are also available in versions running under Unix operating systems, e.g. on workstations or supercomputers.

The ROSE program

The program ROSE- R0ugh Set Data Explorer is an interactive software system running under 32 bit GUI operating systems (Windows 95/NT 4.0) on PC compatible machines.

The input data to the ROSE program is the information system/table which can be defined either by using an internal editor or can be imported from a file. The data are stored in a text file according to special syntax that, besides the description of objects by attributes, may contain additional information about the attributes, e.g. their type, definition of their domains, etc. The ROSE also accepts file formats coming from other systems, i.e. from its predecessor RoughDAS, input, decision table used in Grzymala's LERS system, and formats of files containing learning examples for well-known C4.5 machine learning system.

Except, visualization in GUI, all results are also written to plain text files, so they are also readable outside the system, and can easily be converted to other required file formats.

The ROSE offers currently the following functions:
  • preprocessing of input data, e.g. detecting errors in the definition of input examples, handling missing values of attributes,
  • discretization of real valued attributes by means of various techniques,
  • qualitative estimation of the ability of the condition attributes to approximate the objects classification, using either standard rough set model or variable precision model extension; in both cases, it is possible to compute approximations of classes with their accuracies, calculate the quality of the approximation of the classification, check which objects belong to the given approximation; visualization of atoms and approximations is also available,
  • finding the core of the attributes as well as looking for reducts in the information system (either all reducts or a given number of the best reducts according to an approximation algorithm),
  • studying the significance of a given attribute for the classification of objects,
  • reducing superfluous attributes and selecting the most significant attributes for the classification of objects; there are available several techniques that support the choice of the subsets of attributes ensuring a satisfactory quality of classification (e.g., the technique of adding the most discriminatory attributes to the core),
  • inducing decision rules - certain or approximate on the basis of approximations of decision classes,
  • postprocessing of induced rules, e.g. pruning; looking for interesting rules according to the user's defined queries,
  • applying the decision rules to classify new objects by means of various strategies,
  • evaluation of the sets of decision rules by using k-fold cross validation techniques.
There are two groups of ROSE features that make this system unique and different than other rough set based software. The first group refers to GUI part while the other group is connected with particular methodological aspects. The graphical interface has been designed in such a way that working with the program is very easy and efficient. In particular, it refers to options of editing and preprocessing the information system and presentation of the rough set results as well as visualization of discovered rules. Moreover, the possibility of file transfer with other systems seems to be important for users and the practical applications. Methodologically, one should also notice that the ROSE offers all basic operations of the rough set approach necessary to perform the complete process of knowledge discovery. Particular attention has been paid to the selection of attributes. It does not include looking for all reducts only but is extended by several original approximation techniques that can be easily controlled by the user. The unique methodological features include also the possibility of choosing different techniques of rule induction, i.e.: the user can generate minimum set of rules, exhaustive set or satisfactory set of rules. The first technique is focused on describing input objects by the minimum number of necessary rules while the second approach tries to generate all allowed decision rules that can be discovered from the given information system. The third technique gives as a result the set of decision rules which satisfy given a priori user's requirements. For example, the user can prefer to discover all strong decision rules, i.e. supported by a relatively large number of input objects. The last approach is particularly useful for tasks of an interactive knowledge discovery. Lastly, the ROSE offers an original and efficient approach, for using decision rules to create classification system. It is based on valued closeness relation and makes the system competitive to other well known machine learning and classification systems.

Moreover, the ROSE has a modular software construction that allows for its development in future and easy adaptation to various user's requirements and specificity of the given applications.

The ProFIT program

The program ProFIT - Rough Processing of Fuzzy Information Tables - is an implementation of the generalization of the rough set theory that handles uncertainty in the definition of the information system. Let us remind that in the standard rough set model it is assumed that each pair [object, attribute], must be defined in unique and precise way. In practice, however, these pairs may be neither unique nor precise, i.e. they can be uncertain. Here, the considered generalization allows to take into account the following situations: uncertain discretization of quantitative attributes, imprecise or inexact values of numerical attributes, multiple values possible for one pair [object, attribute] given, e.g. in a form of linguistic fuzzy qualifiers. A special way of modelling these three types of uncertainty uses fuzzy set theory, which boils them down to, so called, multiple fuzzy descriptors. Then, the generalization preserves all characteristic features of the rough set approach while enabling reasoning about uncertain data.

Part of operations offered by the program is the same as those of ROSE, but instead of producing standard rough set results, the ProFIT generates results specific to the generalized rough set theory, e.g. generalized accuracies of approximations or the generalized fuzzy decision rules.

The new features of the ProFIT program include, e.g.:
  • accepting new representations of the input data, including new types of attributes, e.g. their single values can be replaced by sets of possible values; attribute values in the input table may be also given in the form of fuzzy numbers,
  • discretizing the real valued attributes using various discretization algorithms; in particular, the user can choose fuzzy discretization instead of a crisp one,
  • extended user-friendly edit option of GUI that enables to visualize and analyse different aspects of fuzzy set representation of the input information table,
  • easy handling of degrees of possibility for objects in the fuzzy information systems necessary to obtain specific results of the generalized rough set theory,
  • inducing fuzzy decision rules,
  • classifying new objects with the fuzzy decision rules by means of fuzzy logic and valued closeness relation principles; it is also possible to evaluate the quality of decision rules with the help of standard cross-validation tests.
The ProFIT in its current version works in the MS-Windows ver. 3.1 (or higher) environment on PC compatible machines.

A survey of applications

The programs of the Rough Family and their predecessors RoughDASaad Rough-Class have been applied in many fields, e.g. medicine, pharmacy, technical diagnostics, finance and management, image and signal analysis, etc. The references to these applications are given, e.g. in [4, 2, 12, 5]. For example, the considered software has been successfully applied to analyse:
  • the treatment of patients with duodenal ulcer after highly selected vagotomy,
  • multi-stage therapeutic process for patients with peritoneal lavage in acute pancreatitis,
  • attribute dependencies in the large data set concerning urinary stones treatment by ESWL technique,
  • surgery experience concerning patients with multiple injuries,
  • problems of fast diagnosing appendicitis at emergency units,
  • chemical structures of pharmaceutical compounds (e.g. activity relationship of quaternary imidazolium or pyridinium compounds),
  • processing of histological images,
  • technical diagnostics of industrial machinery (e.g. rolling bearing, reducers),
  • maintenance procedures in public transportation systems,
  • financial data concerning evaluation of bankruptcy risk and loan assignment,
  • multi-attribute decision problems,
  • geological problems, i.e. with drawing premonitory factors for earthquakes by emphasing gas geochemistry in Belgium,
  • software project evaluation.