Package 'hclusteasy' reference manual

Title:	Determining Hierarchical Clustering Easily
Description:	Facilitates hierarchical clustering analysis with functions to read data in 'txt', 'xlsx', and 'xls' formats, apply normalization techniques to the dataset, perform hierarchical clustering and construct scatter plot from principal component analysis to evaluate the groups obtained.
Authors:	André Nogueira [aut], Henrique Andrade [aut, cre]
Maintainer:	Henrique Andrade <[email protected]>
License:	GPL-2
Version:	0.1.0
Built:	2025-03-22 05:15:42 UTC
Source:	https://github.com/tsukubai/hclusteasy

Generate and Select Groups with Hierarchical Clustering

Description

Perform hierarchical clustering and generate groups based on sample dissimilarity using the Euclidean method.

Usage

hca(data, method = "complete", num.groups = 3)
hca(data, method = "complete", num.groups = 3)

Arguments

`data`	Dataset in `data.frame` format.
`method`	Method of hierarchical clustering, considering: "ward.D", "ward.D2", "single", "complete", "average" (UPGMA), "mcquitty" (WPGMA), "median" (WPGMC) or "centroid" (UPGMC). Default is "complete".
`num.groups`	Number of groups to cut. Default is three.

Value

A vector of integers, where each element represents the group assigned to each observation in the original dataset.

Examples

# Load the required package
library(hclusteasy)


# Read the 'iris' dataset from the package
data("iris_uci")

# Remove column 'Species' from the iris dataset
iris <- iris_uci[, -5]


# Apply hierarchical cluster and selecting groups
g <- hca(iris)

# Load the required package
library(hclusteasy)


# Read the 'iris' dataset from the package
data("iris_uci")

# Remove column 'Species' from the iris dataset
iris <- iris_uci[, -5]


# Apply hierarchical cluster and selecting groups
g <- hca(iris)

Iris Dataset

Description

This dataset contains 150 flower samples distributed among 3 iris species classes: Setosa, Versicolor, and Virginica. It consists of 5 columns, including 4 attributes measured in centimeters: sepal length and width, and petal length and width, along with a column indicating the iris species. This dataset was introduced by Ronald A. Fisher in 1936 in his classic paper on linear discriminant analysis.

Usage

data("iris_uci")data("iris_uci")

Source

https://archive.ics.uci.edu

References

Fisher, R. A. (1988). Iris. UCI Machine Learning Repository. doi:10.24432/C56C76.

Apply Normalization Techniques to the Dataset

Description

Perform data normalization.

Usage

normalization(data, type = "n0", norm = "column", na.remove = FALSE)
normalization(data, type = "n0", norm = "column", na.remove = FALSE)

Arguments

`data`	Dataset in `data.frame` format.
`type`	Type of normalization. Default is "n1". n0: without normalization n1: standardization ((x-mean)/sd) n2: positional standardization ((x-median)/mad) n3: unitization ((x-mean)/range) n3a: positional unitization ((x-median)/range) n4: unitization with zero minimum ((x-min)/range) n5: normalization in range <-1,1> ((x-mean)/max(abs(x-mean))) n5a: positional normalization in range <-1,1> ((x-median)/max(abs(x-median))) n6: quotient transformation (x/sd) n6a: positional quotient transformation (x/mad) n7: quotient transformation (x/range) n8: quotient transformation (x/max) n9: quotient transformation (x/mean) n9a: positional quotient transformation (x/median) n10: quotient transformation (x/sum) n11: quotient transformation (x/sqrt(SSQ)) n12: normalization ((x-mean)/sqrt(sum((x-mean)^2))) n12a: positional normalization ((x-median)/sqrt(sum((x-median)^2))) n13: normalization with zero being the central point ((x-midrange)/(range/2))
`norm`	Defines whether the normalization will be done by "column" or by "row". Default is "column".
`na.remove`	A `logical` value indicating whether NA values should be excluded before performing normalization calculations. Default is FALSE.

Value

Normalized dataset in data.frame foramt.

Examples

# Load the required package
library(hclusteasy)


# Read the dataset 'iris' from the package
data("iris_uci")

# Remove the column 'Species' from the iris dataset
iris <- iris_uci[, -5]


# Apply normalization to the iris dataset
irisN <- normalization(iris, type = "n1")

# Load the required package
library(hclusteasy)


# Read the dataset 'iris' from the package
data("iris_uci")

# Remove the column 'Species' from the iris dataset
iris <- iris_uci[, -5]


# Apply normalization to the iris dataset
irisN <- normalization(iris, type = "n1")

Plot Principal Component Analysis Results

Description

Apply PCA (Principal Component Analysis) to the data and construct a scatter plot of the first two principal components.

Usage

pca(data, groups = "none")
pca(data, groups = "none")

Arguments

`data`	Dataset in `data.frame` format.
`groups`	Groups to color observations and draw ellipses around each group of samples with a confidence level of 0.98. Default is "none".

Value

A ggplot.

Examples


# Load the required package
library(hclusteasy)


# Read the 'iris' dataset from the package
data("iris_uci")

# Select column "Species" (groups) in the iris dataset
species <- iris_uci[, 5]

# Remove column "Species" in the iris dataset
iris <- iris_uci[, -5]


# Apply pca and ploting the two firsts components without groups
pca(iris)

# Apply pca and ploting the first two components with groups
pca(iris, groups = species)

# Load the required package
library(hclusteasy)


# Read the 'iris' dataset from the package
data("iris_uci")

# Select column "Species" (groups) in the iris dataset
species <- iris_uci[, 5]

# Remove column "Species" in the iris dataset
iris <- iris_uci[, -5]


# Apply pca and ploting the two firsts components without groups
pca(iris)

# Apply pca and ploting the first two components with groups
pca(iris, groups = species)

Read Files in txt, xls, or xlsx Formats

Description

Read datasets files in txt(space-separated), xls or xlsx and return the data as a data.frame.

Usage

read.data(path, col.names = FALSE, col.types = NULL)
read.data(path, col.names = FALSE, col.types = NULL)

Arguments

`path`	Path to the `txt`(space-separated), `xls` or `xlsx` file.
`col.names`	Logical value indicating whether the first row of the dataset should be used as column names. Use `TRUE` to use the first row as column names or `FALSE` otherwise. Default is `FALSE`.
`col.types`	Character or a character vector specifying the data types for each column. Possible values are: "skip" , "guess" , "logical" , "numeric", "date" , "text" , or "list" . Default, it is `NULL`, which means the data types will be determined automatically ("guess"). Note that `txt` files do not support the `col.types` parameter.

Value

Dataset in data.frame format.

Examples

# Load the package
library(hclusteasy)

# Set the file path
file_path <- system.file("extdata",
                          "iris_uci.xlsx",
                           package = "hclusteasy")


# Read a .xlsx dataset
iris <- read.data(file_path,col.names = TRUE)

# Load the package
library(hclusteasy)

# Set the file path
file_path <- system.file("extdata",
                          "iris_uci.xlsx",
                           package = "hclusteasy")


# Read a .xlsx dataset
iris <- read.data(file_path,col.names = TRUE)

Wine Dataset

Description

It consists of a dataset containing 178 wine samples distributed into 3 distinct classes. It has 14 columns, comprising 13 chemical attributes such as alcohol content, malic acid amount, ash, alkalinity of ash, magnesium, phenols, flavonoids, proanthocyanins, color intensity, hue, OD280/OD315 ratio, and proline, along with one column indicating the wine class. This dataset was introduced by Forina et al. in 1991 in a study on the chemical analysis of wines grown in the regions of Italy.

Usage

data("wine_uci")data("wine_uci")

Source

https://archive.ics.uci.edu

References

Aeberhard, Stefan and Forina, M. (1991). Wine. UCI Machine Learning Repository. doi:10.24432/C5PC7J.

Package 'hclusteasy'

Help Index

Generate and Select Groups with Hierarchical Clustering

Description

Usage

Arguments

Value

Examples

Iris Dataset

Description

Usage

Source

References

Apply Normalization Techniques to the Dataset

Description

Usage

Arguments

Value

Examples

Plot Principal Component Analysis Results

Description

Usage

Arguments

Value

Examples

Read Files in txt, xls, or xlsx Formats

Description

Usage

Arguments

Value

Examples

Wine Dataset

Description

Usage

Source

References