JSAI Technical Report, Type 2 SIG

Evolution of Fuzzy Clustering

Mika SATO-ILIC

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 01-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_01

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

Fuzzy clustering has been discussed in order to reflect the pervasiveness of imprecision and uncertainty which exists in the real world. The amount of data is growing and we are faced with a challenge to analyze, process and extract useful information from the complicated vast amount data. Many fuzzy clustering methods have been developed in order to deal with such data. This paper provides outlines of some fuzzy clustering methods in which the target data is asymmetric similarity data or interval-valued data. Moreover, a method for the interpretation of the fuzzy clustering result is stated.

View full abstract

Download PDF (401K)
Building a Logistic Regression Model using Random Subspace

Youichi KITAHARA, Ryohei ORIHARA

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 02-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_02

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

A Logistic Regression Model using Random Subspace Method is investigated through experiments. Ensemble learning is known as one of better prediction methods. The framework makes it possible to improve precision of prediction, however interpretation of the model often gets difficult. Using random subspace method with logistic regression model, this paper tried to provide a solution to the problem. Precision improvement of the model is verified from a preliminary experiment. Furthermore the meaning of the combined model is easily understandable by means of coefficients of the prediction model.

View full abstract

Download PDF (422K)
Infant infectious disease : Examination of algorithm of grasp at current state of epidemic and prediction of epidemics -Grasp of virus infectious disease epidemic in region-

HASEGAWA SHINSAKU, INOUE MASASHI, SUYAMA AKIHIKO

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 03-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_03

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

Research was carried out in an infant infectious disease for the purpose of commanding of the occurrence trend and the early detection of the infectious disease epidemic. It was examined about present state grasping of the patient occurrence in the patient information data by the area of the infection disease occurrence trend investigation, the neighboring zone and the epidemic spread to others prefecture. Moreover, the algorithm of the prediction of epidemics was examined.

View full abstract

Download PDF (1639K)
Local Fisher Discriminant Analysis for Dimensionality Reduction

Masashi SUGIYAMA

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 04-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_04

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

Dimensionality reduction is one of the important preprocessing steps in high-dimensional data analysis. In this paper, we consider the supervised dimensionality reduction problem, i.e., samples are accompanied with class labels. Fisher discriminant analysis (FDA) is a traditional but powerful technique for linear supervised dimensionality reduction. However, FDA tends to give undesired results if samples in a class are multimodal. Locality-preserving projection (LPP) allows us to reduce the dimensionality of multimodal data without losing the local structure. However, LPP is an unsupervised method and is not necessarily effective in supervised learning scenarios. In this paper, we propose a new linear supervised dimensionality reduction method called local Fisher discriminant analysis (LFDA). LFDA effectively combines the ideas of FDA and LPP and works well for dimensionality reduction of multimodal labeled data. LFDA has an analytic form of the embedding transformation and the solution can be easily computed just by solving a generalized eigenvalue problem. This is an advantage over recently proposed supervised dimensionality reduction methods. We demonstrate the practical usefulness and high scalability of the LFDA method in data visualization and classification tasks through extensive simulation studies. We also show that LFDA can be extended to non-linear dimensionality reduction scenarios by applying the kernel trick.

View full abstract

Download PDF (478K)
Principal Component Analysis for sparse functional data with application to handwriting data

Mitsunori KAYANO, Koji DOZONO, Sadanori KONISHI

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 05-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_05

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

We consider principal component analysis for multi-dimensional sparse functional data. The mixed effect model and reduced rank model have been used for analyzing the sparse functional data. In this paper, we introduce a principal component method for the multi-dimensional sparse functional data based on the reduced rank model, and model selections will be performed by using Akaike information criterion (AIC) and Bayesian information criterion (BIC). Further more,the use of the proposed method is illustrated through the analysis of human gait data and handwriting data.

View full abstract

Download PDF (1781K)
R,GAM, and Survival Analysis

MASAAKI TSUJITANI

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 06-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_06

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

This article presents flexible methods for modeling censored survival data using penalized smoothing splines when the covariates values change for the duration of the study. The Cox's proportional hazards model has been widely used for the analysis of treatment and prognostic effects with censored survival data. However, a number of theoretical problems that are to be solved with respect to the baseline survival function and the baseline cumulative hazard function, are involved. The basic idea in this article is to use logistic regression model and generalized additive models with B-splines, and then estimate the survival function. The proposed methods are illustrated using data from a long-term study of patients with PBC (primary biliary cirrhosis) for the purpose of facilitating the decision as towhen to undertake liver transplantation. As illustration of graphical evaluation of covariates, the Stanford Heart Transplant data are also used which has been collected to model survival in patients. We model survival time as a function of patient covariates and transplant status, and compare the results obtained using smoothing spline, partial logistic, Cox's proportional hazards, and piecewise exponential models.

View full abstract

Download PDF (458K)
Quantitative analysis for similarity of learning behavior in medical informatics practice

Akira YASUDA, Shoji HIRANO, Hidenao ABE, Hideaki NAKAKUNI, Eisuke HANA ...

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 07-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_07

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

This paper proposes an approach to investigation on how students behave during the course of practice in medical informatics. Similarities among students' learning behaviors were measured by a tutorial evaluation form, which consists of 19 questionnaires and were analyzed by multidimensional scaling. The results show that items in an evaluation sheet can be divided into two classes on a two dimensional plane. One class includes arrangement of knowledge, important theme, fundamental items, common learning items, a goal for learning, and understanding of other people, which are located in the neighborhood of the origin with high similarities. The other class includes time distribution of discussion announcement, time distribution of the learning plan, with the learning item order, logical explains, which are located in the regions surrounding the former class. In each year from 2002 to 2004, while self-presentation was not observed, learning behavior with extroversion and other-directedness in the learning behavior were observed.

View full abstract

Download PDF (340K)
Relational knowledge discovery based on logical structure of examples

Jun-ichi MOTOYAMA, Tomofumi NAKANO, Nobuhiro INUZUKA

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 08-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_08

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

This paper proposes three algorithms for relational frequent pattern mining based on logical structure of examples. These are methods using bottom-up property extraction from examples. The extracted properties construct patterns by a level-wise way like Apriori. Proprtties are defined in terms of mode of predicates. The algorithms are evaluated with comparison to WARMR.

View full abstract

Download PDF (425K)
Proposal for Distributed Collocation Scores

Tomofumi NAKANO, Nobuhiro INUZUKA, Yukie KOYAMA, Yuka ISHIKAWA

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 09-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_09

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

Studies using collocation become more complex and expand applications. However it was not e±cient to caliculate collocation scores. In this paper, we propose a method to caliculate collocation score e±ently by using frequencies of multi-collocation, and report experimental results.

View full abstract

Download PDF (1003K)
A Method to Estimate Minimum Number of Training Instances for a Valid Rule Evaluation Models

Hidenao ABE, Shusaku TSUMOTO

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 10-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_10

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

In this paper, we present a method to estimate each minimum number of training instances for constructing a valid learning model in a rule evaluation support method. In post-processing of data mining process, rule evaluation procedure is one of important and costly procedures, because it requires experties from human experts. To support such rule evaluation, we have developed a support method based on learning model called rule evaluation model. These models should be valid and constructed from smaller training instances to curb learning costs. Therefore, we have evaluated learning costs with accuracies for an entire training dataset and achive rates of sub-sampled training instances. Then, we show case studies on artificial evaluation results and an actual data mining result.

View full abstract

Download PDF (430K)
Intelligent Sequential Data Mining Based on Self Information and Frequency

Naoki OHTSUKA, Kouji IWANUMA, Hidetomo NABESIMA

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 11-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_11

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

Total Frequency proposed by Takano et al. is a useful anti-mnotonic measure, which was developed for finding out all frequent patterns in a single large-scale data sequence. But, in some application areas extracted sequences only by using a frequency measure are offen meaningless noisy sequences. In this paper, we propose a method that is based on an information gain measure obtained by combining frequency and self-information. This method can check and exclude noisy sequences. Notice that using only self-information as an extracting measure can not find out an important pattern, because data subsequences of high self-information hardly appears in the data base sequence. It is important to extract a pattern sequence which occurs not too many times and also not too few times in a single large-scale data sequence.

View full abstract

Download PDF (475K)
Discovery of Web Usage Patterns based on Frequent Sequential Pattern Mining

Junichi HAYAKAWA, Tomofumi NAKANO, Nobuhiro INUZUKA

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 12-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_12

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

Web usage mining plays an important role in adaptation of Web sites and the improvement of Web server perfomance. It applies data mining techniques to discover Web access patterns from Web usage data. It is interesting to use information of Web Hyperlink structure and Web contents as well as Web Access log to discover Web access patterns. In this paper, we propose a unified form to represent all the three Web information in sequences. The form is convenient to apply to frequent sequential pattern mining algorithms. In addition, in consideration of Web, we give two proposals for frequent sequential pattern mining algorithms.

View full abstract

Download PDF (557K)
Mining Discriminative Subgraphs with Pruning based on Upper-bound of Ingormation Gain

Masahiro HARA, Kiyoto TAKABAYASHI, Kouzou OHARA, Hiroshi MOTODA, Takas ...

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 13-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_13

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

A graph mining technique called Chunkingless Graph-Based Induction (Cl-GBI) can extract discriminative subgraphs from graph structured data by the operation called chunkingless pairwise expansion which constructs pseudo-nodes from selected pairs of nodes in the data. Because of the time and space complexities, it happens that Cl-GBI cannot extract good enough to describe characteristics of data. Thus, to improve its efficiency, we propose a pruning method based on the upper-bound of information gain. Information gain is used as a criterion of discriminativity in Cl-GBI and the upper-bound of information gain of a subgraph is the maximal one that its super graph can achieve. The proposed method allows Cl-GBI to exclude from its search space unfruitful subgraphs that cannot yield the most discriminative one, by comparing the upper-bound of information gain of each subgraph at hand with the best information gain at the moment. Furthermore, in this paper, we experimentally show the usefulness of the proposed pruning method by applying CI-GBI adopting it to both a real world dataset and an artificial datasets.

View full abstract

Download PDF (513K)
[title in Japanese]

[in Japanese]

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 14-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_14

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (413K)
Support for Selecting Algorithms in Data Mining process Based on Data Analysis Records

Hongtao LU, Kouzou OHARA, Takashi WASHIO

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 15-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_15

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

The data mining process consists of sub-processes such as pre-process, mining process, and post-process, in each of which many algorithms are available. Usually, a data miner has to iterate such a process varying algorithms and their parameters in each sub-process in order to obtain the satisfiable result. Such iterative processes impose a heavy burden on users. In addition, for novices in data mining, it is necessary to learn many things about algorithms such as meanings of their parameters. Thus, the selection of appropriate algorithms to use on a new dataset is an important issue. In this paper, we propose a user support system based on Case-based reasoning, which recommends algorithms that seem to be appropriate for a new dataset at hand to the user based on the most similar cases in its case-base. The similarity in our system takes into account not only superficial features of a dataset such as the number of instances in a class, but also its contents. In addition, the system utilizes rules about effects of algorithms learned form the case-base in order to refine the candidate algorithms it selected. We also experimentally show the usefulness of the proposed system by comparing it with an existing method based on Case-based reasoning.

View full abstract

Download PDF (474K)
Animation Interface for Knowledge Reconfirmation using Multiple Viewpoint Maps

Taichi TANAKA, Wataru SUNAYAMA

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 16-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_16

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

The World Wide Web(WWW) has been regarded as a one of the important information databases. We are able to obtain much information from WWW using search engines and understand some knowledge. However, it is difficult to understand a huge amount of knowledge in a short time only using the search engines. We propose an animation interface to support for knowledge reconfirmation. We defined the knowledge as differences between keyword relationships. The interface has three functions: (1) displaying a map of keyword relationships, (2) switching maps depending on viewpoints and (3) displaying an animation between the two maps. Experimental results showed that the interface is able to support for knowledge reconfirmation.

View full abstract

Download PDF (588K)
Derivation of High Correlation Coefficients based on Geometric Constraints

Kotaro NAKANISHI, Takashi WASHIO

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 17-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_17

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

Correlation analysis among variables are frequently used in various approaches of statistics and data mining. However, its application to the data obtained from the recent ubiquitous sensing system consisting of massive sensors is often intractable, since its computational complexity is proportional to the square number of variables. On the other hand, the strong correlations among the variables are usually sparse in various data such as small world data. In this report, we propose a novel method to efficiently estimate the correlations among massive variables under this sparseness. Its experimental evaluations show excellent performance in efficiency comparing with the direct computation of all correlation coefficients.

View full abstract

Download PDF (908K)
An Algorithm for Mining Conditional Correlation Change based on Local Monotonicity

Tsuyoshi TANIGUCHI, Makoto HARAGUCHI

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 18-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_18

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

Several studies have investigated efficient algorithms to detect highly correlated itemset pairs. However, we regard itemset pairs with even medium degree of correlations in a target database, provided the correlations are drastically higher than the corresponding ones in another databases to be contrasted. We consider that the greater change of correlation can be evidence that something to be remarked occurs implicitly in the target database. In a problem of finding such itemset pairs, we consider the problem in a case where one component is given by users. For the given component, we try to find the other component. Because of the nonmonotonicy of degrees of correlation chage, the problem of finding the other component is difficult. However, we prove some monotonicity if we consider some itemsets in the process of mining the other component.

View full abstract

Download PDF (345K)
Covariance and PCA for Categorical Variables and its Application to Real Data

Hirotaka NIITSUMA, Takashi OKADA

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages 19-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_19

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

Covariances from categorical variables are defined using a regular simplex expression for categories. The method follows the variance definition by Gini, and it gives the covariance as a solution of simultaneous equations using the Newton method. The calculated results give reasonable values for test data. A method of principal component analysis (RS-PCA) is also proposed using regular simplex expressions, which allows easy interpretation of the principal components.

View full abstract

Download PDF (967K)
[title in Japanese]

[in Japanese]

Article type: SIG paper
2007Volume 2007Issue DMSM-A603 Pages c01-
Published: February 27, 2007
Released on J-STAGE: August 28, 2021

DOIhttps://doi.org/10.11517/jsaisigtwo.2007.DMSM-A603_c01

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Download PDF (263K)

Register with J-STAGE for free!