The ABD group has a number of ongoing projects:
- CLAM (Clustering, Learning, and Approximation on Manifolds) is the foundational software and algorithmic framework underlying much of our work. The software implementation is constantly undergoing improvement.
- CHAODA (Clustered Hierarchical Anomaly and Outlier Detection Algorithms) is being extended to a supervised, multi-class machine learning model.
- CAKES (CLAM-Accelerated K-nearest-neighbors Entropy Scaling Search) provides for fast $k$-nn and range ($\rho$-nn) search. It provides perfect recall when the distance function used obeys the triangle inequality, and its performance scales sublinearly (indeed, often close to constant time) as the dataset grows. Future work includes the implementation of a variety of distance functions, as well as online or streaming updates to the clustered dataset.
- panCAKES (Data compression for CAKES) seeks to implement data compression using CLAM, initially with discrete (e.g. string or integral data) but eventually with floating-point data.
- Phylogenetic compression of biological sequence data seeks to extend CLAM compression and clustering to support large multiple sequence alignments.
- RF Anomaly Detection seeks to demonstrate CHAODA's anomaly detection on time-series data using unusual distance metrics such as dynamic time-warping and Taken's method for projection into higher dimensions.
- Cyber Kill-Chain research, in the area of cybersecurity, explores the manifold nature of cybersecurity threat data models, with goals of anomaly detection and similarity search.
- Adversarial machine-learning input detection maps the neuron activation pattern of an artificial neural network into a high-dimensional metric space, with the goal of using anomaly detection (including CHAODA) as a filter to identify possibly adversarial inputs to a trained network.
- CLAM Graph Visualization is a novel approach to visualizing very high-dimensional data without explicit dimensionality reduction; instead, it represents manifold connectivity as a graph. We seek to eventually develop a virtual reality application from this.