Access the implementation of an automatic time series clustering system using Fisher’s dynamic programming algorithm in my Github repository.

Overview

Automatic classification, or clustering, aims to group similar objects into homogeneous classes based on their characteristics. Time series data, which represents how a quantity changes over time, poses unique challenges for clustering. This article presents an implementation of Fisher’s dynamic programming algorithm for classifying time series datasets into sequential categories. The performance of this approach is compared to alternative methods.

Background

As part of our unsupervised learning coursework, we explore clustering methods that can identify sequential classes in time series data. Fisher’s dynamic programming algorithm, originally used in speech recognition to segment audio signals and identify speakers, is well-suited for this task. By segmenting a data sequence into homogeneous portions, we can better organize and understand the underlying patterns.

MNIST Sample

Results and Discussion

Implementing Fisher’s dynamic programming algorithm allowed us to compare its performance with hierarchical clustering (CAH) and K-means methods. The key advantage of Fisher’s algorithm is its ability to account for the temporal nature of the data, whereas CAH Ward and K-means treat the dataset as points in a multi-dimensional space without considering time.

Despite some optimization using specialized packages, the non-linear complexity of Fisher’s algorithm could limit its efficiency on larger datasets, especially when run on CPUs. Future work should evaluate the algorithm’s performance on GPUs, which are better suited for intensive matrix computations.

Key Takeaways

  • Fisher’s dynamic programming algorithm is effective for clustering time series data into sequential categories
  • The algorithm accounts for the temporal nature of the data, unlike CAH Ward and K-means
  • Optimizing the algorithm’s implementation and running it on GPUs could improve efficiency for larger datasets
  • Understanding the strengths and limitations of different clustering approaches is crucial for selecting the most appropriate method for a given dataset and problem

For a deeper dive into the implementation details and to access the code, visit the Github repository.