Volume no 12
In the series „Taxonomy"
„Classification and data analysis – theory and applications”
Andrzej Sokołowski, Krzysztof Jajuga – IFCS-2004 Chicago – some impressions
The paper presents some facts and remarks on the world IFCS conference, which took place in Chi-
cago. First of all, some facts on IFCS and SKAD – Polish member society are presented. Then we
provide the analysis of the contents of the papers presented during conference.
Kazimierz Zając, Daniel Kosiorowski – Cultural changes and socio-economic development
The objective of the paper is the verification of the hypothesis stating that there is relation between
socio-economic and cultural development. In the empirical studies conducted for Polish territorial
units the following multivariate statistical methods were applied: canonical correlation analysis, fac-
tor analysis and T2 Hotelling statistics. The main conclusion drawn from the studies is the one that
cultural development is one of the national priorities since it leads to socio-economic development.
Joanicjusz Nazarko, Joanna Chrabołowska – Application benchmarking techniques to effi-
ciency assessing of Polish distribution utilities
The paper presents an overview of benchmarking techniques applied to electricity distribution utilities. A
benchmarking study of 33 Polish electricity distribution utilities is presented to illustrate the methodologi-
cal and data issues. The study examines the effect of the choice of benchmarking methods using DEA and
Krzysztof Jajuga – Dependence in data analysis – discussion on some approaches
The paper presents the discussion on the methods of dependence analysis, where one considers con-
tinuous variables. Different criteria to systematize dependence analysis methods are proposed. The
particular attention is paid to copula analysis, which is the generalization of the traditional approach
to analyze linear dependence. In the paper the methods to analyze extreme dependence are also giv-
en. Finally author proposes the approach to analyze the dependence in the case of heterogeneous
Artur Łapczyński, Marek Niedźwiecki, Andrzej Sokołowski – Radio 3 Hit List – statistical
Radio 3 Hit List is the most popular weekly pop chart in Poland, based on the listeners votes. His-
tory, the most popular charts in the world and their place in the entertainment market is presented in
the first part of the paper. Then the Radio 3 Hit List is characterised with the emphasis put on voting
systems, chart rules and special features like “waiting room”, “unlucky seven”, etc. Special data
base covering all weekly editions up to the end of 2003 has been prepared. More than 4000 songs,
described by their chart run characteristics are then clustered. Groups are then characterized by av-
erages and by typical representatives. Some other statistical problems are also addressed, like the
distribution of popularity, chart survival curves, outliers, voting cheating, the efficiency of different
voting systems and voting generators.
Tadeusz Borys, Małgorzata Markowska – Social indicators of sustainable development
In this article the sustainable development is interpreted as an integrated order, in which substantial
role – apart from economic and environmental order – is played by the social order. The main ob-
jective of the study is to identify social indicators of sustainable development. It presents Polish ex-
periences in elaborating sets of these indicators on the country level, at the background of other Eu-
ropean Union countries, but also in the context of monitoring by means of indicators, being created
right now for two basic, strategic documents of the European Union: the Lisbon Strategy and the
EU Sustainable Development Strategy.
There are also presented selected social indicators of sustainable development for the European
Union countries. The performed comparative analysis referring to space and time presents the posi-
tion of Poland with reference to other countries, the mean values for the EU-15, before the acces-
sion and the mean values for the European Union in its present shape (25 countries).
Eugeniusz Gatnar – Feature selection to aggregated classification models
Significant improvement of classification accuracy can be obtained by aggregation of multiple
models. Known methods in this field are mostly based on sampling cases from the training set, or
changing weights for cases.
Further reduction of classification error can be achieved by random selection of variables to the
training subsamples or directly to the model. In this paper we propose a new correlation-based fea-
ture selection method for classifier ensembles (CFSH) that is contextual (uses feature intercorrela-
tions) and based on the Hellwig heuristic. It gives more accurate aggregated models than those built
with other correlation-based feature selection methods.
Mirosława Lasek – Feature selection in clustering – application of the feature selector heuris-
The aim of this paper is to discuss problems concerning feature selection for grouping objects and
to propose heuristic method for determining such features which can discriminate objects as best as
possible. In the proposed method we start with single feature to which next features are added until
obtaining such good possibilities of clustering with assumed criteria of quality, so that no further
improvement of quality of clustering is possible. In the paper, algorithm of the method has been
described, the strengths and weaknesses of the algorithm are examined and discussed. The method
is illustrated with an example of feature selection for clustering of industrial enterprises in order to
analyse and compare their financial standing.
Jadwiga Suchecka, Jan Kowalik – The analysis of time series with outliers using the
As a result of action of different external factors, such as: political and economic crises, out-
breaks of wars, some observations clearly different from the others, called the outliers, appear in the
economic time series.
The outliers generally cause the results of conducted analyses, based on time series, are failed,
unreal or even invalid. Therefore, it is important to use some appropriate procedures for detecting
and eliminating those effects from time series.
This paper shows the geostatistics kriging method in a detection problem of outliers in time se-
Marek Walesiak – Variable selection and weighting problems in cluster analysis
Choice of variables is the one of the most important steps in a cluster analysis. Variables used in
applied clustering should be selected and weighted carefully. In cluster analysis we should include
only those variables that are believed to help discriminate the data.
– main aspects of selection and weighting of variables to cluster were characterised,
– point at limitations of variable selection for cluster analysis based on data generated from
– main approaches to variable selection and weighting for cluster analysis were discussed.
Katarzyna Kuziak – Model risk in risk management process
Model risk is understood as the special risk that arises when reality is described by theoretical mod-
el (every model is a simplification of reality). The first step before using any of theoretical models
should be model risk control. Theoretical models play an important role in risk management
process. Risk management process can be separated into three activities (steps): identification, mea-
surement, monitor and control. In every step (for pricing, hedging, risk valuation) theoretical mod-
els will be used. This causes the additional risk, a so-called model risk. The paper will present mod-
el risk, types of model risk and sources of model risk. Empirical evidence will be given.
Waldemar Tarczyński – Utilizing P/E ratio in process of analyses of securities
In the paper the Author proposed an analysis of securities with variable profit, where the P/E ra-
tio is a criterion determining their quality. Essence of the analysis was a research on financial stand-
ing of companies within groups obtained with P/E ratio as a criterion of division. Next analysed cri-
terion of division was a division of companies into companies with high level of P/E ratio, medium
level of P/E ratio and low level of P/E ratio. For particular groups synthetic measures of develop-
ment were calculated, which allow assessment of fundamental strength of companies and groups.
Companies quoted on Warsaw Stock Exchange were analysed. Yearly data about financial ratios
and P/E ratio was used. The analysed period was 2000-2003.
Anna Zamojska-Adamczak – Application of investing styles in classification of mutual funds
Mutual funds operating on the capital market in Poland can be classified basing on classical criteria of de-
cision making process such as: risk and return, published output, investing horizon or simplicity of trans-
actions. Widely known in literature classification of mutual funds is based on the type of investment or on
risk measures (estimated linear regressions of returns). W.F. Sharpe proposed classification of funds with
respect to styles of investment. The style of investment is a method of investment selected by the fund. It
allows to group funds in particular type with respect to such characteristics as: the goal of investing plan,
type of securities creating portfolio, expected return, level of risk, size and financial condition of fund,
Ewa Chodakowska, Joanicjusz Nazarko, Anna Worobiej – Marima models for energy price
forecasting at the power exchange in Poland
The paper discusses application of Multivariate autoregressive Integrated Moving Average models
for energy prices forecasting at the Day-Ahead Market at The Power Exchange in Poland. The analy-
sis of energy prices has been carried out and at the same time there has been made an attempt to find a
relation between price and its volume. On the basis of available data, a MARIMA model has been
proposed which describes price as a function of its previous values and energy volume.
Joanna Chrabołowska – Selection of economic-financial indices for comparative studies of ef-
fectiveness of Polish electrical energy distribution utilities
The article presents selection process of economic-financial indices for comparative studies of ef-
fectiveness of domestic electrical energy distribution utilities.
There have been analysed 42 economic-financial indices related to distribution and transmission
activity of 33 distribution utilities. An accessible set of indices has been reduced essentially and sta-
tistically. On the basis on factor analysis there have been described 3 dimensions (profitability and
productivity, employment effectiveness and operational efficiency) that make full characteristics of
examined distribution utilities possible. The companies have been ranked according to each of
In author’s opinion multivariate approach to the assessment of efficiency is indispensable.
Alicja Ganczarek – An application of the canonical analysis on Polish energy market
In this paper I presented an application of the canonical analysis on the day-ahead market of the energy
market CIRE. There have been two sets of prices and volumes of electric energy were noted on CIRE
from July 2002 to June 2004. The aim of this paper was the classification the energy markets and the clas-
sification the prices and volumes of electric energy during a day. In the next step we use this classification
to describe dependence between these sets. We described this dependence with regression equation.
Beata Basiura – Empirical homogeneity test for Ward’s method applied to the set of Polish
A one-sided homogeneity test allowing to stop an agglomerative process in Ward’s method
is presented in the paper. This approach has been proposed by Sokolowski in 1992, but critical val-
ues have been given only for n = 49. The results for n = 16 are presented in this paper. Rough criti-
cal values are found by analysis of data sets generated by the multidimensional normal distribution.
Then they are smoothed using some analytical functions dependent on a number of object and the
number of attributes. Functions appeared to fit extremely well. If hypothesis H0 about the homo-
geneity is not true then at some stage of sub-groups agglomeration process the agglomerative dis-
tances are greater then they should be assuming the hypothesis H0 is true.
Andrzej Bąk – Problems of parameters estimation in decompositional models with discrete
The paper presents some problems linked with nonmetric decompositional models. There are pre-
four types of measurement scales (especially discrete scales),
sources and consequences of consumer preferences heterogeneity,
nonmetric decompositional models included heterogeneity of preferences.
Feliks Wysocki, Aleksandra Łuczak – Application of fuzzy multi-criteria linear classification
methods to construction of synthetic measure of counties development
The aim of this paper was to investigate the applicability of the fuzzy multi-criteria classification
methods to the construction of synthetic characteristics. The method proceeds from the erection of a
hierarchic structure for the examined multi-criteria problem and utilizes linguistic variables and
trapezoidal fuzzy numbers to rate basic characteristics and criteria for each decision element. The
proposed procedure was employed to assess the socio-economic development of rural Wielkopolska
seen as a collection of counties.
Józef Kupczyk – Identification and measurement of weather risk in a firm
Earnings of many firms depend on weather conditions, for example observed temperature,
precipitation, wind speed etc. In these situations it can be said that firms are exposed to weather
risk. This paper presents a method for identification and measurement of this risk in a firm,
especially the impact of weather conditions on the volume of sales is considered.
Małgorzata Łuniewska – A sector synthetic measures of development in securities analysis
In the paper there is presented another approach to selection of stocks to portfolio. There are two
main questions: is it useful to build sector synthetic measures of development in securities analyses?
The second is how to use such sectors division in securities analysis? In the paper at first all securi-
ties were divided by sectors and in the next step for such groups synthetic measures of development
were built. A classification of securities on sectors was used to construction of portfolio. The re-
search was done for all listed companies on Warsaw Stock Exchange, which have been noted on the
capital market since 2000. In the research, especially in sector classification by using synthetic
measures of development there was used year’s economic and financial ratios in period of the years
Agnieszka Majewska – Classification of derivatives quoted on world exchanges using
generalised distance measure
The main goal of the article is to show which derivatives quoted on world exchanges were the most
popular in 2002 and 2003. For this purpose stock, index, interest rate, currency and commodity op-
tions and futures were classified. Generalised Distance Measure was the method of classification in
this research. The activity of investors on derivatives markets was measured by three variables: vol-
ume traded, underlying value and open interest.
Received results show that interest rate futures, stock options and interest rate options were the
most popular derivatives in the world, such as in Europe.
Katarzyna Halicka – Forecast of energy prices on Polish Power Exchange with use of neural
In this paper the use of neural network to forecast prices of energy on Polish Power Exchange is
presented. Three prognostic models, with different input variables and number of neurons in layers,
were built. The following variables were used in the neural network as input: demand for energy, air
temperature, wind speed, cloud conditions and rainfall. On the basis of constructed models the
prediction of energy prices was done and then evaluation of forecasts quality was performed.
Joanicjusz Nazarko, Mikołaj Rybaczuk, Arkadiusz Jurczuk – Influence of random noise on
an identification of ARIMA models
The main problem of time series modelling with ARIMA models is to identify class
(autoregression, moving average, autoregression and moving average) and their order. The basis for
the identifying is an analysis of plot of autocorrelation (AC) and partial correlation functions
(PAC). The paper presents results of simulation research on influence of white noise presence and
its variance level on identification of basic ARIMA models. The effect of noise variance level on a
quality of estimation of ARIMA models has been also presented in the paper. It has been considered
that possibility of correct identification and estimation strongly depends on presence and variance
of a random noise.
Urszula Kołaszewska – The usage of cluster analysis in analogy forecasting
In the work a spacial-time analogy method was applied for issuing a forecast factors influencing e-
commerce development in Poland. In order to choose pattern objects for forecasted object, a cluster
analysis and ranking were carried out. Out of the group of countries classified as the best prepared
there were chosen countries similar to Poland. The similarity of objects was estimated on the basis
of the shape criterion.
Jerzy Korzeniewski – Proposal of new algorithm for determining the number of clusters
The new algorithm is based on the comparison of pseudo cumulative distribution functions of a
certain random variable. This variable is defined as follows. For a fixed window size we draw k
different points and for every point we find the corresponding limiting point in the mean shift
procedure. Then we check if the distance (e.g. Euclidean) between every pair of the limiting points
is smaller than the window size. The probability of meeting this condition is the value of the pseudo
cumulative distribution function at the point equal to the window size. Analogously we determine
the pseudo cumulative distribution functions for different numbers k of clusters. The proper number
of clusters is the one that corresponds to the last (with respect to k) curve to possess a horizontal
phase at the altitude smaller than 1.
Kamila Migdał-Najman, Krzysztof Najman – Analytical procedures for determining the
number of clusters
Several clustering techniques have been proposed for the analysis of data sets. Cluster validity
indices represent useful tools to support such a task. They are particularly relevant in applications in
which there is not a priori indication of the actual number of clusters. In this paper two validation
indices were applied to fifteen data sets, using different intracluster and intercluster distances. The
resultant optimal clusters have been found to be stable for the different validity indices used, viz.
Davies-Bouldin Index and Dunn’s Index. It was shown that these methods might support the
prediction of the optimal cluster partitioning for those data sets but the determination of the optimal
number of clusters is an open problem.
Dorota Rozmus – Error decomposition in aggregated discriminant models
The idea of error decomposition comes from regression where the square loss function is applied.
Prediction error is decomposed into three components: noise, bias and variance.
There are also trials to apply the idea of error decomposition in classification. But the sum of
those three components is different to the value of classification error. This is why several authors
have proposed their own definitions of decomposition components for classification problem. In
this paper there are presented and compared known decompositions for 0-1 loss.
Mirosława Czerniawska, Joanicjusz Nazarko, Mikołaj Rybaczuk – Graphical methods in
comparative studies of value systems
The paper concerns a comparative study of value systems carried out on 644 students of Polish and
Russian high schools. Factors such as nationality, gender, field of education have been taken into
consideration. The paper proves the usefulness of graphic methods in such analyses, one of the rea-
sons being the increased efficiency of human mind in information processing. Authors of the study
have applied a graphic method that presents the structure of multidimensional data on the plane.
This has allowed for a multi-perspective view of the studied value systems and an in-depth interpre-
tation of the research results.
Marta Dziechciarz – Employee satisfaction survey in the business organization. Classification
The paper shows the application of the cluster analysis framework for the employee classification in
the organization. Over 38 percent of employees took part in the job related satisfaction survey, which
was conducted in a branch of the multinational company. The surveyed firm employs almost 600
people. According to the procedure proposed by Calinski and Harabasz – the classification into seven
clusters was chosen. The k-means method has been used for classification. In the first step, 40
variables were used as a base for classification. In the second step, each employee was described by
15 variables corresponding with motivation issues in the company.
Paweł Lula – The cluster analysis of Polish text documents
The main purpose of the paper is to study the process of cluster analysis of Polish text documents.
We consider the influence of different factors (choice of methods of text representation, utilization
of stop-list and stemming, conducting Latent Semantic Analysis and choice of clustering methods)
on results of clustering procedure. At the final part of the paper on the basis of experiment results
there are formulated some recommendations for carrying out clustering analysis of Polish texts.
Krzysztof Szwarc – Attempt to using path analysis to researching on poverty gap
Path analysis is the method of measure of influences between variables. It permits to define factors
which determine a low earnings, so variables which determine poverty gap too. The research was
done on poor households, which lived in Jezyce district in Poznan. Between factors determining
earnings are among others things: age, sex, health and education of head of household and also
place of living, number of people in household, number of working people in household. By means
of path analysis I measured influence of some variables on earnings this household.
Cyprian Kozyra – Application of logistic regression to analysis of perceived health
The paper presents application of cumulative logit model to the analysis of multinomial data re-
ceived from state of health research of Torun inhabitants. Logistic regression models have been
used to test the dependence of perceived health assessment on demographic characteristics like age,
income and education. Proportional odds model with equal slopes for each category was compared
with null independence model and full model with regression coefficients estimated for each cate-
gory. The following statistical relationships between perceived health and demographic characteris-
tics of respondents were shown in the paper:
– the older respondent the poorer perceived health
– the more earning respondent the better perceived health,
– the longer education the better perceived health.
Małgorzata Misztal – The use of recursive partitioning method to identify operative risk
subgroups among patients with coronary artery disease
The decision to perform Coronary Artery Bypass Grafting (CABG) surgery on a patient with coro-
nary disease is taken under conditions of risk and uncertainty. In that case the benefits of CABG
must be balanced against its risk.
The study was conducted to identify preoperative risk factors associated with morbidity outcome
among patients undergoing isolated CABG and to develop some classification rules assigning
patients to selected risk subgroups. Prediction rules were established on the basis of tree-structured
models. The following tree-based algorithms were used: QUEST, CRUISE, LOTUS and PLUS.
Jadwiga Suchecka, Sylwia Nieszporska – EQ-5D questionnaires as a methods of indirect
measurement of quality of health state
The measurements that are based on the health assessment from the patient’s perspective are more
often used in the nowadays study on the quality of life. Such measurements serve the assessment
of equality and effectiveness in applications of different therapies, but they also become a base for
economical assessment of the health care system. The main aim of this paper is the evaluation of the
patients’ health in one of the Czestochowa’s hospitals using EQ-5D descriptive system. This system
is interesting because it not only describes the health care of the respondents in a complex way, but
also delivers a weighted index for health’s state, which can be used in economical and clinical
assessment of the health care.
Ewa Chodakowska – An application of multivariate ARIMA models in SCA statistical system
MARIMA models can be used in a variety of applications. This class of models effectively connects
the basic concepts of the regression model with these of ARIMA models. However, they are not
frequently used in practice, because popular computer software for statistical data analysis does not
include algorithms for MARIMA models identification and estimation. SCA Statistical System is
one of the few systems that supports MARIMA method of forecast-ing. In this paper methodology
of MARIMA models’ construction in SCA has been presented. On the basis of a time series
analysis consecutive phases of model-building process have been described and discussed.
Andrzej Dudek – Factorial analysis of symbolic objects
This article presents some extensions of factorial analysis and principal component analysis onto
data represented in form of symbolic object table (matrix with symbolic objects as rows). Each cell
in symbolic object table can contain single quantitative value, categorical value, interval,
multivalued variable, multivalued variable with weights. Vertices and Centers algorithms
implementing principal component analysis on symbolic objects are described and an example of
using principal component analysis on data in form of numeric intervals is presented.
Marcin Pełka – Taxonomic variables in symbolic dissimilarity measure in symbolic data
The aim of this article is to present all types of dissimilarity measures in symbolic data analysis
(such as Gowda-Diday, Ichino-Yaguchi and De Carvalho measures) with problems which can be
found in each of them while using specified variables types, especially taxonomic variables. The
article presents Polish suggestions for terms used in dissimilarity measures. For empirical examples
analysis of car marked was used.
Justyna Wilk – Hierarchical clustering methods in symbolic data analysis
The article looks at the types and concepts of symbolic data and then attempts to review the availa-
ble hierarchical clustering methods (traditional and symbolic) to analyze. Symbolic data were de-
fined and contrasted with classical data and the examples were presented.
Małgorzata Wójcik – Approach symbolic factorial discriminant analysis into symbolic objects
In the article the bases of Factorial Discriminant Analysis are introduced. The method allows to
discover dependency among various symbolic objects characteristic. The main objectives are
defining a geometrical classification rules and visualizing on a factorial plane the classes of
symbolic objects, separated from each other in the best way. To sum up the example based on
SODAS programme was introduced.
Marta Borda – Evaluation of financial standing of the life insurance companies with selected
In view of the specific nature of insurance activity, insurance companies are more exposed to insol-
vency risk than other enterprises. From the insurer’s point of view, the evaluation of the financial
condition enables the identification and measurement of risk appearing in various areas of financial
management. The paper presents an attempt of evaluation of financial standing of the life insurance
companies operating in the Polish market between 1998 and 2002 using selected taxonomic meth-
ods. In order to achieve this aim ranking of the life insurers according to the synthetic development
measure (TMKF) and classification of these institutions by application of k-means method and
Ward’s method were conducted.
Urszula Gierałtowska – Using discriminant function as an instrument of portfolio selection in
the capital market
In the article the author carried out the usage of discriminant function with an agglomerative
measure as a criterion of discrimination in order to evaluate economic and financial condition of a
company. The companies at the Warsaw Stock Exchange were classified on the basis of economic
and financial measures in the years 2001–2003. The results of the classification were acceptable and
they were used to predict to which of the two groups a company will belong. The results of this
evaluation and analysis were compared with the results of nonclassical methods. The structural
assumptions of the function were not fulfilled, but the function was a good identifier all the same.
Magdalena Mojsiewicz, Katarzyna Wawrzyniak – Methodology of insurance market
In the article there was presented classification of market segmentation, beginning from the social-
demographic, preferences to behavior segmentation. For the sample of Polish households in which
there was observed the loyalty to mark, risk perception, independence by buying insurance and
price, 4 classes of clients were distinguished. Most of cases were in the first class, so called non-
rational clients. For this group there was proposed segmentation by inclinations.
Daniel Papla – Analysis of stock markets prices in Poland and other countries using
conditional -stable distribution and copula functions
This paper presents a trial in using conditional -stable distribution in the estimation of the real
distribution of the chosen stock market returns. To analyse relations between markets there were
used copula functions, also in the conditional form. First part of the paper presents theoretical
background of methodology used in research. There are discussed both the -stable distribution
and copula function. Exact methodology is presented in second part. There is described method of
changing parameters of stable distribution and copula function to the conditional form in detail.
Third part presents results of the empirical research. Data used in research includes indexes from
capital markets in Poland, USA, UK, Germany, France, Czech Republic, Hungary and Slovakia.
Interpretation of the obtained results concludes the paper.
Krzysztof Piontek – Modeling skewness and excess kurtosis in stock returns using conditional
Pearson type IV distribution
The AR-GARCH models with conditional normal or Student’s distributions are usually using to
modelling many effects that occur in financial time series of returns. This two distributions are not
sufficient to account for the skewness in the data. Therefore, there is a real need to use an
asymmetric density that can be easily estimated and whose tails are sufficiently heavy. Pearson type
IV density is such a distributions and additionally nests normal and Student’s distributions. The
results received from the research showed that for 30% of analysed instruments from Polish market
the best model AR(1)-GJR-GARCH(1,1) model with Pearson type IV conditional distribution of
errors. The results can be applied to risk measuring with Value at Risk method or to option pricing.
Artur Kozłowski, Anna Małgorzata Olszewska – The usage of selected statistical methods in
The authors of the following paper made a survey on the product of a brand in the sample
representing its customers. On the basis of collected data brand identification was estimated and
customers satisfaction analysis of the purchase was carried out. The analysis was conducted by
means of logit models, multiple choice models and discrimination analysis.
Presented methods may be used by firms to make a diagnosis of factors influencing products’
regarding of the examined brand. Applications of these methods enable to select these factors that
are crucial for strengthening the brand in the market.
Adam Kurzydłowski, Artur Zaborski – The assessment of the attractiveness of language
schools by multivariate methods
The paper presents results of the research which carried in five biggest language schools in Jelenia
Gora. The aim of analysis was the identification of factors which determined the choice of school
and evaluation of market position of the selected schools. The research was conducted by means of
two methods of multivariate statistical analysis: classification trees and multidimensional scaling.
Aneta Rybicka – Conjoint analysis methods and discrete choice methods
In the paper decompositional approach in measurement of consumers’ preferences: conjoint analy-
sis methods and discrete choice methods are presented. Also the characteristic of this methods, their
advantages and disadvantages and also their similarities and differences are described.
Mariusz Grabowski – Efficiency of selected methods for imputation of incomplete data and
different missing data mechanisms
Incomplete data, issue addressed by various authors, constitute a serious problem for many classical
data analysis methods. The concept of utilizing SOM neural network (Kohonen 1995) as an
effective technique for handling data vectors containing missing components was presented in the
previous papers of the author. The following article compares SOM with other advanced missing
data handling methods across different missing data mechanisms (Little, Rubin 1987): MCAR,
MAR and NMAR. Artificially created multidimensional random data sets with various predefined
distributions will be the basis for this comparison.
Mariusz Kubus – The application of the logical functions for nominal attributes ordering
This paper presents the method of nominal attributes ordering and its application to class
description with the use of the logical trees. Minimal logical tree orders attributes according to
generality of objects set description. Obtained order is used for description space search. The last
variables in order have a little meaning in creating classification rules and often can be omited.
Proposed method is compared to well known algorithm AQ11.
Jan Meus, Justyna Stefaniak – Analysis and data classification in the aspect of coefficient Z
Publications dedicated to the analysis of contingency tables in statistical literature focus frequently
on establishing statistical significance between investigated variables. It leads to treating them as
single states, though these variables are usually defined as systems of events creating the the
specific one. The reason of this approach is in the lack of method which would treat both variables
as a system of two systems of disjoint events and would introduce the analysis of the “interior”
structure of the data. In this paper a new asymmetrical measure of dependence has been introduced
which bases on the above mentioned assumption.
Joanna Trzęsiok – Adaptive kernel methods of regression
In this article kernel methods for regression are presented. These methods are based on theory of
local estimation when one needs to estimate an unknown function only at a single point. The
methods don not need the analytic form of the model, so they are the adaptive methods. An example
is given for illustration.
Michał Trzęsiok – Support vector machines for regression
For nonlinear regression problem, support vector machines (SVM) map the input space into a high-
dimensional feature space first, and then perform linear regression in the high-dimensional feature
space. The nonlinearity of SVM is realized by choosing the kernel function. Performance of SVM is
very sensitive to the choice of the kernel and model parameters. In the paper the method is
presented and the dependency of its performance on the kernel and the model parameters selection
Jacek Batóg – Analysis of relation between market value and kind of economic activity of
public companies and rate of return as a base of investment strategy
Efficiency of investment decision depends on many factors and assumptions. In the paper the author
tried to verify the hypothesis about significant influence of market value and kind of economic ac-
tivity of public companies on rate of return.
Research deals with two different periods: decreasing and increasing prices of shares. The
procedure that has been used was based on analysis of variance. The author formulated diagnosis
about an optimal number of clusters by means of a measure of heterogeneity based on entropy
Mariola Chrzanowska – Application of selected models for the borrowers’ classification
The work presents research on paying off the occasional credits received in March 2004. The
sample consists of 162 borrowers. According to the data from the end of July 2004, 82 borrowers
pay off the credit regularly and 80 fail to pay off the credit. The aim of the report is to prepare a
dichotomic classification of borrowers with the use of the logit and probit models.
Sebastian Majewski, Mariusz Doszyń – Classification of aggressive investment funds
according to propensity to risk
The main goal of the article was classification of aggressive investment funds according to their
propensity to risk. Definitions of the propensity and the propensity to risk were discussed.
Possibilities of measuring of propensities, especially financial institutions’ propensities to risk were
also presented in this paper. Dependences between the propensity to risk and rate of return were
Sebastian Majewski – Using fuzzy clustering for classification of stocks by investment
The investment attractiveness of stocks was described by the value of the membership function,
which means a degree of belief that pointed stock belongs to the one of three sets: “attractive”
stocks, “non-attractive” stocks and stocks from the uncertain area. The analysis was based on two-
dimensional and seven-dimensional characteristics of objects. The weekly data from 2002 were
used in the research. The fuzzy c-means method was used to classification of stocks. In the case of
using fuzzy clustering for classification of stocks investor has more comfortable information to
making investment decisions.
Iwona Staniec – Classifying trees in evaluation of the inborrower's creditworthiness
The evaluation by use classifying tree of the credit risk has been presented in this article. It is based on
data gained from „Gwarant” company for individual credits on purchase of television set. The test
contained 4746 credit contracts. Among them 250 contracts (5,268%) were not repaid and 4496
contracts (94,732%) were paid off. Therefore a mischievousness was only 5,268% in analysing test.
Mirosława Gazińska – Comparative analysis for voivodships with regard to level of
The aim of the paper is to classify new voivodships (i.e. after the administrative reform) in the years
1995-2002 from the point of view of selected demographic processes. The demographic
development levels are analysed on the basis of three groups of diagnostic features: 1) population
structure by age and sex, 2) mortality, and 3) marriage and procreative behaviours. The application
of a synthetic measure leads to ordering voivodships with regard to demographic development in
selected groups of diagnostic features and for all features in total.
Janusz Korol, Przemysław Szczuciński – Analysis and diagnosis of sustainable development
level as compared to EU countries
First of all, the paper addresses the concept of sustainable development to evaluate Poland’s posi-
tion as compared to other EU countries by defining its distance to those countries. The research re-
sults confirm existence of disproportions in development in Poland and other EU member countries.
Second, there were estimated and verified (using modeling of structural equations) selected
relations among coefficients of economic, social and environmental order trying to find regularities
in sustainable development in EU countries.
Dorota Kwiatkowska-Ciotucha, Urszula Załuska, Józef Dziechciarz – Job satisfaction as an
assessment criterion of labour market policy efficiency. Classification approach
Authors proposed a new approach towards the efficiency assessment of the labour market policy.
The proposal includes the main actors of the labour market – employee and their perception of the
job related satisfaction as the measure of the policy quality. The data on job related satisfaction
contained in the ECHP database (European Community Household Panel) where some 150
thousands respondents from 15 old EU member states are included was used. Twostep Cluster
Analysis was used for respondents’ classification and multivariate statistical analysis framework
was used for cross national comparisons and in order to identify Best European Practice.
Mirosława Lasek, Piotr Sylwesiuk – Absorption of EU funds – comparative analysis of finan-
cial situation for rural communities of the Podlaskie voivodship
The aim of the paper was to analyze a connection between financial situation of communities and
EU funds absorption. It seems that, a communities with a better financial situation which have
much money to contribute in projects co-financing from the UE budget, should apply for UE funds
more often than the communities with a bad financial situation. To realize the aim the authors have
compared: financial situation of communities (using the Zero Unitarization Method) and absorption
of EU funds. The research covers rural communities of the Podlaskie voivodship in the years 1993-
Monika Rozkrut, Dominik Rozkrut – Analysis of the relative development of districts of the
north-western province of Poland in the period of 1999-2003
We study the level of development of districts in north-western region of Poland. This is one of the
six regions established in Poland according to The Nomenclature of Territorial Units for Statistics
(NUTS). Major socio-economic regions (NUTS 1) are used for analyzing regional Community
problems and for the purposes of appraisal of eligibility for aid from the Structural Funds. In 2004
governmental commission decided how to divide Poland into six NUTS 1 regions. The regions
were formed as groups of voivodships. After that, authorities from voivodeships whose
development was lagging behind, expressed their concerns about being joined with wealthier
voivodships in one region, what statistically improves their socio-economic situation (and reduces
expected flows of structural funds). In the paper we tried to verify these concerns. The objects of
the study were districts of north-western region. We evaluated the level of economic development
and its spatial diversification, using synthetic taxonomic measure of development, tree clustering
and k-means clustering.
Danuta Strahl, Małgorzata Markowska – Classification of the European space innovation
The contemporary world development is, to a great extend, determined by the processes of
innovation. The enlarged European Union assumes that obtaining high level of innovation,
facilitating an effective competition at the global market, becomes the strategic objective for the
coming years. Therefore, the study presents an assessment of innovation level diversification in the
European Union countries by means of 11 indicators. Classification of the European Union
countries has been conducted in order to distinguish groups similar with regard to human resources
for innovation, creating new knowledge and financing innovation processes.
Hanna Dudek – Detecting of collinearity by using centered, noncentered and generalized
variance inflation factors
In this article we analyze the concepts of variance inflation factors VIF. Centered VIF is misleading
in the case of small variation of one of the regressors. In such cases noncentered VIF should be ap-
plied. In order to determine collinearity for a set of related regressors, such as dummy regresssors
corresponding to a categorical variable, generalized inflation factor is applied.
Moreover explanatory variables selection methods are analyzed in the context of collinearity. We
demonstrate that by using Hellwig’s method and Bartosiewicz’s method it is possible to choose
variables with linearly dependent vectors of observations.
Krzysztof Dziekoński, Danuta Tarka – Testing of granger causality on selected capital market
This paper presents results of the analysis of the potential influence that changes in the DJIA,
FTSE100, CAC40 and DAX30 stock market indices can have on the changes in value of Polish
stock market index WIG20. The results show the dependence of the WIG20 index on the value of
DJIA, FTSE100, CAC40 and DAX30. Dependence was considered in accordance with the Granger
causality definition. The Granger and Sims-GMW tests were used in the analysis. The analysis was
done using daily open and close values of indices under consideration from 1996 till 2003.
Elżbieta Gołata, Grażyna Dehnel – Evaluation of local labour market situation according to
different data sources
The paper deals with the problem of convergence in measurement of unemployment at local scale
in the cross-section of poviats in Wielkopolska voivodship. The measures of unemployment are es-
timated upon different sources of information. We take into account such databases as: registration
of unemployment, Labour Force Survey, Population Census 2002 as well as indirect estimates made
by Central Statistical Office and our own estimates.
In the analysis we discuss availability of information concerning economic activity of local
population. Different types of measures and their properties when applied at local scale are
analyzed. The relation between registered unemployment and according to ILO definition is
examined and conclusions for evaluation of unemployment rate are drawn.
Witold Hantke – The classification of branch labour markets in Silesian province according to
job supply and demand
The Silesian labour market is very differentiated internally. The representatives of particular
professions differ from one another in both, the dimension of job supply and demand, as well as
their regional structure. In this paper, groups of particular professions have been divided into
homogenious classes, using the cluster- and discriminant analysis. Also, applying the techniques of
linear ordering, a rank of professions has been created. The carried out researches confirm that
simple workers, compared to specialists in different fields, are in the worst situation.
Małgorzata Wasiuk – Risk in effectiveness assessment of energetic investment based on
The economic effectiveness of investment decision we can evaluate on the basis of NPV. Effective-
ness assessment concerns future effects, which are based on estimated incomes and expenses (dur-
ing 10- 25 years). Indexes calculated for simulated future effects are burden with suspense, it in-
volves risk that effects will be different that expectations.
In this paper the results of risk assessment researches as one of the element of whole planning
investment were presented. The molding of distribution of NPV index was carried out by means of
Monte Carlo method. The parameters of the distribution were also estimated. The relationships
between NPV index and extreme values of individual distribution were described.