Error message

  • Notice: Undefined variable: _SESSION in tracking_init() (line 27 of C:\xampp\htdocs\rsds\sites\all\modules\rsds\tracking\tracking.module).
  • Warning: file_get_contents(http://user-agent-string.info/rpc/get_data.php?key=free&format=ini&ver=y): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in UASparser->get_contents() (line 247 of C:\xampp\htdocs\rsds\sites\all\modules\rsds\tracking\UASparser\UASparser.php).
  • Warning: file_get_contents(http://user-agent-string.info/rpc/get_data.php?key=free&format=ini): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in UASparser->get_contents() (line 247 of C:\xampp\htdocs\rsds\sites\all\modules\rsds\tracking\UASparser\UASparser.php).

SOFTWARE

Browse software:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

RSDM
Depatamento de Lenguajes y Sistemas Informaticos e Ingeneria del Software Faculdad de Informatica, Campus de Montegancedo, Madrid, SPAIN


Abstract

RSDM represents the architecture of a system that adds KDD capabilities to RDBMS. It also provides an API to easily add new Data Mining capabilities to the system in a way transparent to the final user. As a result, RSDM acts as a generic engine of Data Mining algorithms. Three main goals guided the design development of the system:
  • Efficiency when dealing with extralarge volumes of data
  • Easy enhancement of the system
  • Independency from the underlying RDBMS
Different techniques have been applied in order to get Data Mining capabilities, but Rough Set methodology has to be remarked as it has been applied to get most of the functionalities that are already working in the system.

Capabilities added by the system

As it has been already mentioned, RSDM keeps the power of the relational system while adding Data Mining capabilities Rough Set methodology, generalization and relational database techniques among others have been integrated to provide the following features:
  • Discretization possibilities: When dealing with quantitative data, before applying any mining algorithm, the values to be discretized.
  • Reduction of the set of attributes: A reducing algorithm discovers dependencies among data and redundancies. This allows the system to eliminate is based on the ideas of.
  • The ability for extracting discriminant rules: Given a concept, the discriminant procedure finds rules that discriminate objects belonging to it.
  • Extraction of character rules: The extraction of characteristic rules gives a set of rules that characteristic rules gives a set of rules that characterize elements in a set.
Architecture

In order to get the functionalities just mentioned, the following architecture has been developed.
  • User Communication Module: This module is the one attending queries from the user. It translates a communication grammar into a sequence of orders sent to work area. When the query solved this module returns the result to the user.
  • Working Area: This one is the module that manages the data the system is working with in a particular moment. The data that are the target of the operators are store in this module as well as their results. Then the above module takes the results to present them to the user.
  • Operator Loader: Queries from the user are sent by user communication module to this one. Independently of the kind of algorithm, in order to gain efficiency, every algorithm is going to be decomposed into atomic operations that will be executed in parallel. Each of these atomic units will be called on ‘operator’. In order for query to be solved this module loads the needed operators in a dynamic way. The fact that algorithms are loads the needed operators in the dynamic way makes it possible to add new ones to the system without affecting its structure. In this way the architecture of the system without affecting in structure. In this way architecture of RSDM gives support to a new algorithms providing them the proper access, storage and communication methods needed to work properly.
  • Data Mining catalog: Some DM queries require prior to applying a process, to make calculations with the data. In order to gain efficiency, information about data that has to be repeatedly calculated when applying a particular algorithms, is precalculated and stored by this module. Some of the information needed is already-kept in the catalogue of relation database management systems. Storing information requires having available storage area, but the improvement in efficiency of the whole process justifies the cost.
  • Access to the Database: The modules of the system will operate independently from the underlying RDBMS which will be transparent to them. Data needed by the system are asked to this module which in turn will translate in order to the particular database language. Thanks to the transparency supplied by this module, data from different platform can be input to the system. To conclude, the independency from databases that RSDM provides, allows both to store and organize data in different database as well as to use data from different platforms.
Implementation

The third of goals of our design is to obtain an architecture hat allows efficient management of extra large volumes data to help companies to handle and analyse their data. Parallelism techniques have been used for the system to efficiently deal with extra large volumes of data.

In particular, Light Weight Process (LWP) technique has been used in the implementation of RSDM. The mentioned technique allows for execution of different processes in a concurrent way, running in a environment of shared memory. Applying LWP makes it possible to execute the set atomic operations in witch each algorithm is structured in a parallel fashion. Each of these operators can be applied to different set of target data.

State of the art of the implementation

A prototype of the generic engine is already working. Algorithms for the calculation of positive region and reduct are also working and it is possible to extract characteristic rules with the aid of them. Association rules algorithms are under development as well as graphical user interface.

Comparison with other system

RSDM has been conceived as an engine of KDD algorithms instead of a system that adds some particular capabilities. This approach has its advantages as well as disadvantages. On the one hand, the idea of building an engine of algorithms in contrast to all the exiting Data Mining system will allow to add new capabilities witch the only task of building the module that will execute such capability, avoiding the complex process of codifying of the algorithm but also for the communication, storing of intermediate results and so on. One the other hand, the process of construction of the architecture is more complex, and that is the reason why some data Mining capabilities are not yet available such as the association rules, prediction, generalization tasks to name a few. However it has to be remarked once again that adding any of these capabilities is a straightforward task once the architecture has been finished.

Integration with RDBMS, RSDM provides an API to integrate different commercial and non-comercial database management systems. Up to the present moment the system interfaces Postgres and Oracle.

Methodologies that has been applied in the algorithms. Rough ser theory as well as relational theory have been integrated by the discovery algorithms that have already been implemented in the system. Rough set operations have been translated first to relation algebra (when possible) and then to SQL in order to improve the efficiency. For a derailed of the algorithms see.

Conclusions and future work

The architecture as well as the main properties of the system RSDM have been explained. Methodology used as well as advantages and disadvantages in comparison with other system have been discussed. We are currently working on the implementation and testing of tightly-coupled release of the algorithms as well as on the implementation of association rules, extraction and discretization modules. The design of proper Data Warehouse to help the mining tasks, is under development. As a result the data dictionary will be enhanced to support those data about the data that are necessary for the efficient mining of the database.

Acknowledgements

We are very much indebted for inspiration to Dr. Ziarko, Dr. Pawlak and Dr. Skowron. Thanks are due to Dr. Wasilewska and Dr. Hadjimichael for several helpful comments.