C# power set for large set sizes
Recently for a Microsoft ML.NET machine learning project I wanted to generate a power set for choosing the optimal set of features for the model. Unfortunately all the existing examples I found were limited to a set size of 31 or 32 as they were using integer types and also attempting to return the complete power set in a single collection bounding them to collection size limits.
My machine learning model contained 33 features and while a power set of that size would contain over eight billion sets I just wanted to evaluate power sets with 31-33 elements as an initial step which yields a more reasonable set size of 529. To solve the problem I wrote the following code that uses 64-bit unsigned integers and returns the sets as an enumerable to avoid the excessive memory use of attempting to create the complete collection in memory.
C# Source Code
static void Main()
{
// Following test of power set of length 31 to 33 takes around
// six minutes on a Core i7-4770K CPU @ 3.5 GHz.
var testData = Enumerable.Range(0, 33).ToArray();
int totalCount = 0;
foreach (var set in PowerSet(testData, 31))
{
Console.WriteLine(string.Join(",", set));
totalCount++;
}
Console.WriteLine(totalCount);
}
public static IEnumerable PowerSet(T[] seq, int minLength = 0)
{
if (seq == null)
{
throw new ArgumentNullException(nameof(seq));
}
var enumerable = Enumerable.Range(0, seq.Length);
ulong powerSetSize = 1UL << seq.Length;
ulong startPoint = (1UL << minLength) - 1;
for (ulong curVal = startPoint; curVal < powerSetSize; curVal++)
{
int bitCount = CountBits(curVal);
if (bitCount >= minLength)
{
yield return enumerable
.Where(e => (curVal & (1UL << e)) > 0)
.Select(e => seq[e])
.ToArray();
}
}
}
// Following by Jon Skeet / Brian Kernighan, see https://stackoverflow.com/a/12171691/1599751
public static int CountBits(ulong value)
{
int count = 0;
while (value != 0)
{
count++;
value &= value - 1;
}
return count;
}