Coresets: 2 Coresets

2.1 What are coresets?

An accurate definition of coresets is the following given by Bachem, Lucic, and Krause (2017),

  • Coresets are small, weighted summaries of large data sets such that solutions found on the summary itself are provably competitive with solutions found on the entire data set.

More intuitively, coresets are small subsets of the larger data (with weights associated with each data point) that can represent the original data with a good approximation on a particular model/algorithm in terms of cost.

Coresets are extremely useful for models that do not scale well with data. If a model can not be optimized further on time complexity, using coresets is a realistic option to cope up with training time. Theoretical guarantee is a unique property about coresets over other approximation techniques, making it robust for practical usage.

In the next chapter, we shall see the introduction to coresets for the KMeans clustering algorithm.