Title: | Determining Hierarchical Clustering Easily |
---|---|
Description: | Facilitates hierarchical clustering analysis with functions to read data in 'txt', 'xlsx', and 'xls' formats, apply normalization techniques to the dataset, perform hierarchical clustering and construct scatter plot from principal component analysis to evaluate the groups obtained. |
Authors: | André Nogueira [aut], Henrique Andrade [aut, cre] |
Maintainer: | Henrique Andrade <[email protected]> |
License: | GPL-2 |
Version: | 0.1.0 |
Built: | 2025-02-20 05:01:31 UTC |
Source: | https://github.com/tsukubai/hclusteasy |
Perform hierarchical clustering and generate groups based on sample dissimilarity using the Euclidean method.
hca(data, method = "complete", num.groups = 3)
hca(data, method = "complete", num.groups = 3)
data |
Dataset in |
method |
Method of hierarchical clustering, considering: "ward.D", "ward.D2", "single", "complete", "average" (UPGMA), "mcquitty" (WPGMA), "median" (WPGMC) or "centroid" (UPGMC). Default is "complete". |
num.groups |
Number of groups to cut. Default is three. |
A vector of integers, where each element represents the group assigned to each observation in the original dataset.
# Load the required package library(hclusteasy) # Read the 'iris' dataset from the package data("iris_uci") # Remove column 'Species' from the iris dataset iris <- iris_uci[, -5] # Apply hierarchical cluster and selecting groups g <- hca(iris)
# Load the required package library(hclusteasy) # Read the 'iris' dataset from the package data("iris_uci") # Remove column 'Species' from the iris dataset iris <- iris_uci[, -5] # Apply hierarchical cluster and selecting groups g <- hca(iris)
This dataset contains 150 flower samples distributed among 3 iris species classes: Setosa, Versicolor, and Virginica. It consists of 5 columns, including 4 attributes measured in centimeters: sepal length and width, and petal length and width, along with a column indicating the iris species. This dataset was introduced by Ronald A. Fisher in 1936 in his classic paper on linear discriminant analysis.
data("iris_uci")
data("iris_uci")
Fisher, R. A. (1988). Iris. UCI Machine Learning Repository. doi:10.24432/C56C76.
Perform data normalization.
normalization(data, type = "n0", norm = "column", na.remove = FALSE)
normalization(data, type = "n0", norm = "column", na.remove = FALSE)
data |
Dataset in |
type |
Type of normalization. Default is "n1".
|
norm |
Defines whether the normalization will be done by "column" or by "row". Default is "column". |
na.remove |
A |
Normalized dataset in data.frame
foramt.
# Load the required package library(hclusteasy) # Read the dataset 'iris' from the package data("iris_uci") # Remove the column 'Species' from the iris dataset iris <- iris_uci[, -5] # Apply normalization to the iris dataset irisN <- normalization(iris, type = "n1")
# Load the required package library(hclusteasy) # Read the dataset 'iris' from the package data("iris_uci") # Remove the column 'Species' from the iris dataset iris <- iris_uci[, -5] # Apply normalization to the iris dataset irisN <- normalization(iris, type = "n1")
Apply PCA (Principal Component Analysis) to the data and construct a scatter plot of the first two principal components.
pca(data, groups = "none")
pca(data, groups = "none")
data |
Dataset in |
groups |
Groups to color observations and draw ellipses around each group of samples with a confidence level of 0.98. Default is "none". |
A ggplot
.
# Load the required package library(hclusteasy) # Read the 'iris' dataset from the package data("iris_uci") # Select column "Species" (groups) in the iris dataset species <- iris_uci[, 5] # Remove column "Species" in the iris dataset iris <- iris_uci[, -5] # Apply pca and ploting the two firsts components without groups pca(iris) # Apply pca and ploting the first two components with groups pca(iris, groups = species)
# Load the required package library(hclusteasy) # Read the 'iris' dataset from the package data("iris_uci") # Select column "Species" (groups) in the iris dataset species <- iris_uci[, 5] # Remove column "Species" in the iris dataset iris <- iris_uci[, -5] # Apply pca and ploting the two firsts components without groups pca(iris) # Apply pca and ploting the first two components with groups pca(iris, groups = species)
Read datasets files in txt
(space-separated), xls
or xlsx
and return the data as a data.frame
.
read.data(path, col.names = FALSE, col.types = NULL)
read.data(path, col.names = FALSE, col.types = NULL)
path |
Path to the |
col.names |
Logical value indicating whether the first row
of the dataset should be used as column names. Use |
col.types |
Character or a character vector specifying the data types
for each column. Possible values are: "skip" , "guess" , "logical" ,
"numeric", "date" , "text" , or "list" . Default, it is |
Dataset in data.frame
format.
# Load the package library(hclusteasy) # Set the file path file_path <- system.file("extdata", "iris_uci.xlsx", package = "hclusteasy") # Read a .xlsx dataset iris <- read.data(file_path,col.names = TRUE)
# Load the package library(hclusteasy) # Set the file path file_path <- system.file("extdata", "iris_uci.xlsx", package = "hclusteasy") # Read a .xlsx dataset iris <- read.data(file_path,col.names = TRUE)
It consists of a dataset containing 178 wine samples distributed into 3 distinct classes. It has 14 columns, comprising 13 chemical attributes such as alcohol content, malic acid amount, ash, alkalinity of ash, magnesium, phenols, flavonoids, proanthocyanins, color intensity, hue, OD280/OD315 ratio, and proline, along with one column indicating the wine class. This dataset was introduced by Forina et al. in 1991 in a study on the chemical analysis of wines grown in the regions of Italy.
data("wine_uci")
data("wine_uci")
Aeberhard, Stefan and Forina, M. (1991). Wine. UCI Machine Learning Repository. doi:10.24432/C5PC7J.