Configure Renewable Resource Clusters¶
This guide explains how to set up renewable resource clusters (wind and solar sites) using PowerGenome's flexible clustering system. The clustering approach allows you to divide renewable resources by performance characteristics, geography, and cost attributes.
Overview¶
Renewable clustering divides potential wind and solar sites into discrete groups that can be invested in by the capacity expansion model. The clustering system supports four complementary mechanisms:
- Filters - Exclude sites based on criteria (e.g., only sites with LCOE < $50/MWh)
- Groups - Divide sites by categorical features (e.g., wind class, county)
- Bins - Divide sites by continuous features into ranges (e.g., LCOE quartiles)
- Clusters - Apply k-means clustering within each group/bin combination
Total clusters = (# groups) × (# bins) × (# clusters per combination)
For example: 3 wind classes × 4 LCOE bins × 2 k-means clusters = 24 total resource clusters
When to Use Renewable Clusters¶
Use renewable clusters when:
- Geographic diversity matters (wind/solar in different regions)
- Cost variation is significant (want to build cheapest sites first)
- Performance differences exist (Class 3 vs Class 7 wind)
- Resource limits apply (land availability in each county)
Skip clustering if you have a simple system where all renewable sites are similar.
What the resource data should contain¶
PowerGenome expects each renewable site record to include at least:
lcoe: Pre-calculated levelized cost that blends capital cost and capacity factorcf(orcapacity_factor): Performance metric used for clusteringinterconnect_mwor similar: Interconnection cost that will be re-aggregated when you change regional aggregations
Wind/solar “resource group” files generated for projects come with exactly these fields. If you add or reaggregate regions, interconnection costs will need to be recalculated for the new region layout.
Basic Configuration Pattern¶
Renewable clusters are defined in the renewable_clusters section:
renewable_clusters:
- region: all # Required: region(s) to include
technology: landbasedwind # Required: technology name
filter: [...] # Optional: exclude sites
group: [...] # Optional: categorical divisions
bin: [...] # Optional: continuous feature ranges
cluster: [...] # Optional: k-means within groups
group_modifiers: [...] # Optional: adjust costs by group
Step 1: Start with Filters (Recommended)¶
Filters remove sites that don't meet quality thresholds. If your resource data includes a cost metric like lcoe, start by filtering on it.
Why Filter on LCOE?¶
LCOE (Levelized Cost of Energy) combines capacity factor and capital costs. Filtering on LCOE ensures you only consider economically viable sites, reducing computational complexity.
renewable_clusters:
- region: all
technology: landbasedwind
filter:
- feature: lcoe
max: 50 # Only sites with LCOE ≤ $50/MWh
Filter structure: Each filter in the list must have:
feature: Column name from resource data (e.g.,lcoe,capacity_factor,dist_to_tx_km)max: Maximum value (optional, sites with feature > max are excluded)min: Minimum value (optional, sites with feature < min are excluded)
Example filters:
filter:
- feature: lcoe
max: 50 # Exclude sites with LCOE > $50/MWh
- feature: cf
min: 0.3 # Exclude sites with CF < 30%
- feature: dist_to_tx_km
max: 100 # Exclude sites > 100 km from transmission
Step 2: Add Bins for Cost Stratification (Recommended)¶
After filtering, bin on LCOE to divide resources into cost tiers. This allows the model to build the cheapest sites first.
LCOE Binning Example¶
renewable_clusters:
- region: all
technology: landbasedwind
filter:
- feature: lcoe
max: 50
bin:
- feature: lcoe
q: 4 # Create 4 quantile bins (quartiles)
weights: capacity_mw # Weight by capacity when creating bins
Result: Sites divided into 4 bins (e.g., $0-$30, $30-$38, $38-$44, $44-$50 LCOE ranges)
Quick-start ideas
- First filter on
lcoeto drop obviously uneconomic sites - Use
mw_per_binto target roughly one bin per ~5-10 GW of available capacity - If early runs show large unused capacity in a specific region, add a region-specific filter with a tighter
lcoecap instead of shrinking other regions
Bin structure: Each bin entry is a dictionary with:
feature: Column name to bin on (required)bins: Integer number of equal-width bins OR list of bin edges (e.g.,[0, 30, 50, 75])q: Integer number of quantile bins OR list of quantile edges (e.g.,[0, 0.25, 0.5, 0.75, 1.0])weights: Column to weight bins by (optional, e.g.,mwto weight by capacity)mw_per_bin: Alternative tobins/q- calculate number of bins from total MW (optional)mw_per_q: Alternative tobins/q- calculate number of quantiles from total MW (optional)
Note: Use either bins OR q, not both. If weights is specified with q, weighted quantiles are calculated.
Binning Methods¶
Quantile binning (recommended for balanced clusters):
Equal-width binning (may create unbalanced bins):
Explicit bin edges:
Capacity-based bins:
bin:
- feature: lcoe
mw_per_bin: 10000 # ~10 GW per bin, number of bins calculated automatically
weights: capacity_mw
Binning on Other Features¶
# Bin by capacity factor (performance tiers)
bin:
- feature: cf
q: 3
weights: capacity_mw # Equal capacity in each performance tier
# Bin by distance to transmission
bin:
- feature: dist_to_tx_km
bins: [0, 25, 50, 100, 200] # Explicit distance ranges
Multiple Bins¶
You can apply multiple binning dimensions sequentially:
bin:
- feature: lcoe
q: 3
weights: capacity_mw
- feature: cf
q: 2 # Further divide each LCOE bin by CF
Result: 3 LCOE bins × 2 CF bins = 6 total bin combinations
Step 3: Use Groups for Categorical Divisions¶
Groups divide sites by categorical features like wind class, county, or interconnection zone.
Wind Class Grouping¶
Most common use case - divide by resource quality:
renewable_clusters:
- region: all
technology: landbasedwind
group:
- class # Divides by wind class (Class 3, Class 4, etc.)
cluster:
- feature: cf
n_clusters: 2
method: agg
Result: Each class gets 2 k-means clusters (e.g., Class 3 → 2 clusters, Class 4 → 2 clusters)
Geographic Grouping¶
Divide by county or zone to respect land availability:
renewable_clusters:
- region: all
technology: utilitypv
group:
- county # One or more clusters per county
cluster:
- feature: lcoe
n_clusters: 1 # Single representative site per county
method: agg
Multiple Groups¶
Combine categorical features (multiplies cluster count):
renewable_clusters:
- region: all
technology: landbasedwind
group:
- class # 3 wind classes
- ipm_region # 5 IPM regions (use different column than model region)
cluster:
- feature: cf
n_clusters: 2
method: agg
Total clusters = 3 classes × 5 regions × 2 k-means = 30 clusters
Step 4: Configure K-Means Clustering¶
After grouping/binning, k-means clusters aggregate similar sites within each combination.
Basic Clustering¶
renewable_clusters:
- region: all
technology: landbasedwind
cluster:
- feature: cf # Cluster on capacity factor
n_clusters: 3 # 3 clusters per group/bin
method: agg # Clustering algorithm
Cluster structure: Each cluster entry is a dictionary with:
feature: Column name(s) to cluster on (required). Can be a string or list of strings. Useprofileto cluster on generation profiles.method: Clustering algorithm -kmeans,agglomerative/agg,max_distance(required)n_clusters: Number of clusters (required)mw_per_cluster: Alternative ton_clusters- calculate number of clusters from total MW (optional)
Clustering on Multiple Features¶
Specify a list of features to cluster on multiple columns simultaneously:
This creates 2 clusters based on geographic proximity AND performance similarity.
Common feature choices:
latitude,longitude: Geographic proximitycapacity_factororcf: Performance similaritylcoe: Cost similarityelevation,slope: Terrain characteristicsprofile: Generation time series (only works withagglomerativemethod)
Clustering on Generation Profiles¶
cluster:
- feature: profile # Cluster on hourly generation patterns
n_clusters: 3
method: agglomerative # Required for profile clustering
Alternative Clustering Methods¶
# Hierarchical/agglomerative clustering
cluster:
- feature: cf
n_clusters: 4
method: agg # Short for agglomerative
# Max-distance clustering (ensures spatial diversity)
cluster:
- feature: [latitude, longitude]
n_clusters: 5
method: max_distance
Limiting Total Clusters¶
Note: The total number of clusters is the product of groups × bins × n_clusters. To manage cluster count:
- Reduce the number of groups
- Reduce the number of bins (
qorbinsvalue) - Reduce
n_clusters - Be selective about which regions/technologies to cluster
renewable_clusters:
- region: all
technology: landbasedwind
group: [class, ipm_region] # Could create 3 × 5 = 15 combinations
bin:
- feature: lcoe
q: 2 # × 2 bins = 30 combinations (reduced from 4)
cluster:
- feature: cf
n_clusters: 1 # × 1 cluster = 30 total
method: agg
Reducing bins/clusters can simplify the model while preserving geographic and cost diversity.
Step 5: Modify Costs by Group¶
Group modifiers adjust costs for specific groups after clustering (e.g., higher interconnection costs in certain regions).
renewable_clusters:
- region: all
technology: landbasedwind
group: [class]
cluster:
- feature: cf
n_clusters: 2
method: agg
group_modifiers:
- group: class
group_value: 3
capex_mw: [mul, 1.1] # 10% higher capex for Class 3 wind
fixed_o_m_mw: [add, 5] # +$5/MW-yr O&M
- group: class
group_value: 7
capex_mw: [mul, 0.95] # 5% lower capex for Class 7
Modifier Operations¶
[mul, factor]: Multiply existing value[add, value]: Add to existing value[sub, value]: Subtract from existing value- Scalar: Replace value entirely
group_modifiers:
- group: ipm_region
group_value: CA_N
capex_mw: [mul, 1.2] # 20% cost increase
interconnect_mw: [add, 50] # +$50/MW interconnection
- group: ipm_region
group_value: TX
capex_mw: 800000 # Set to $800k/MW (replaces value)
Calculating Total Cluster Count¶
Understanding how cluster count multiplies is critical for managing model size.
Formula¶
Total Clusters = (# filter-passing groups) × (# bins) × (# k-means clusters)
- If no groups: 1 group assumed
- If no bins: 1 bin assumed
- If no k-means: 1 cluster assumed
Examples¶
Simple k-means only:
→ 5 total clusters
Groups + k-means:
→ 3 × 2 = 6 total clusters
Bins + k-means:
→ 4 × 3 = 12 total clusters
Groups + bins + k-means:
→ 5 × 3 × 2 = 30 total clusters
Groups + bins (no k-means):
→ 3 × 20 × 4 × 1 = 240 total clusters (potentially too many!)
Complete Examples¶
Example 1: Simple Wind Clustering (Recommended Starting Point)¶
renewable_clusters:
- region: all
technology: landbasedwind
# Filter to economically viable sites
filter:
- feature: lcoe
max: 50
- feature: capacity_mw
min: 5 # Sites must be ≥ 5 MW
# Divide into 4 cost tiers
bin:
- feature: lcoe
q: 4 # 4 quantiles (quartiles)
weights: capacity_mw # Equal capacity in each bin
# 2 k-means clusters per cost tier (geographic diversity)
cluster:
- feature: [latitude, longitude, capacity_factor]
n_clusters: 2
method: kmeans
Result: 4 bins × 2 clusters = 8 wind resource clusters
Example 2: Wind with Class Groups¶
renewable_clusters:
- region: all
technology: landbasedwind
filter:
- feature: lcoe
max: 60
# Group by wind class (categorical)
group: [class]
# Bin by cost within each class
bin:
- feature: lcoe
q: 3
weights: capacity_mw
# 1 representative cluster per class/bin
cluster:
- feature: cf
n_clusters: 1
method: agg
# Adjust costs for lower-quality wind
group_modifiers:
- group: class
group_value: 3
capex_mw: [mul, 1.15] # 15% higher costs
Result: 3 classes × 3 bins × 1 cluster = 9 wind clusters
Example 3: Solar with Geographic Groups¶
renewable_clusters:
- region: all
technology: utilitypv
filter:
- feature: cf
min: 0.18
- feature: lcoe
max: 40
# Group by county for land use limits
group: [county]
# No binning (costs similar within counties)
# Single representative site per county
cluster:
- feature: [latitude, longitude]
n_clusters: 1
method: kmeans
# Higher interconnection costs in remote counties
group_modifiers:
- group: county
group_value: Rural_County_A
interconnect_mw: [add, 100]
- group: county
group_value: Rural_County_B
interconnect_mw: [add, 150]
Result: 1 cluster per county (e.g., 25 counties = 25 solar clusters)
Example 4: Offshore Wind (Performance-Based)¶
renewable_clusters:
- region: all
technology: offshorewind
filter:
- feature: cf
min: 0.35 # Only high-quality sites
- feature: dist_to_shore_km
max: 50 # Within 50 km of shore
# Bin by performance (no geographic groups - limited areas)
bin:
- feature: cf
bins: 3 # 3 equal-width bins
# 2 clusters per performance tier
cluster:
- feature: [latitude, longitude, water_depth]
n_clusters: 2
method: agg
Result: 3 CF bins × 2 clusters = 6 offshore wind clusters
Example 5: Complex Multi-Region System¶
renewable_clusters:
- region: [TX, OK, KS, NM] # Great Plains only
technology: landbasedwind
filter:
- feature: lcoe
max: 55
# Group by state and wind class
group: [region, class]
# Bin by cost within each group
bin:
- feature: lcoe
q: 2 # Keep bins low with multiple groups
weights: capacity_mw
# Small number of clusters per group/bin
cluster:
- feature: cf
n_clusters: 1
method: agg
group_modifiers:
- group: ipm_region
group_value: TX_Class7
capex_mw: [mul, 0.9] # Best TX wind
- group: ipm_region
group_value: NM_Class3
capex_mw: [mul, 1.1] # Lower quality NM wind
Result: 4 regions × ~2 classes × 2 bins × 1 cluster = ~16 clusters
Best Practices¶
1. Start Simple, Add Complexity¶
Phase 1 - Filter + Bins:
filter:
- feature: lcoe
max: 50
bin:
- feature: lcoe
q: 4
cluster:
- feature: cf
n_clusters: 2
method: agg
Phase 2 - Add groups if needed:
filter:
- feature: lcoe
max: 50
group: [class] # Add categorical division
bin:
- feature: lcoe
q: 3 # Reduce bins to manage count
cluster:
- feature: cf
n_clusters: 2
method: agg
2. Always Filter on LCOE First (If Available)¶
Filtering removes uneconomical sites before clustering, reducing:
- Computational time
- Memory usage
- Irrelevant resource options
# Good: Filter before clustering
filter:
- feature: lcoe
max: 60
cluster:
- feature: cf
n_clusters: 10 # Only clusters good sites
method: agg
# Avoid: Clustering everything
cluster:
- feature: cf
n_clusters: 50 # Includes expensive sites
method: agg
3. Use Quantile Binning for Balanced Clusters¶
Quantile binning ensures each cost tier has similar site counts:
bin:
- feature: lcoe
q: 4 # Each bin has ~25% of sites (quartiles)
weights: capacity_mw # Can also weight by capacity
Equal-width binning may create bins with very few sites.
4. Match Cluster Count to Model Scale¶
Small model (1-5 regions):
- 5-15 clusters per technology
- Simple binning (2-3 bins)
- Single k-means cluster per bin
Medium model (5-15 regions):
- 15-40 clusters per technology
- Groups by region or class
- 2-3 bins × 1-2 k-means clusters
Large model (15+ regions):
- 40-100 clusters per technology
- Multiple groups (region + class)
- Keep bins and n_clusters low to manage total count
- Consider 1 k-means cluster per group/bin
5. Use Group Modifiers Sparingly¶
Only modify costs when you have specific regional data:
# Good: Specific, data-driven adjustment
group_modifiers:
- group: ipm_region
group_value: CA_N
interconnect_mw: [add, 75] # Known higher interconnection costs
# Avoid: Arbitrary adjustments
group_modifiers:
- group: ipm_region
group_value: Region1
capex_mw: [mul, 1.05] # Why 5%?
6. Check Cluster Counts Before Running¶
Calculate expected clusters:
- Count unique values in grouping columns
- Multiply by bins and k-means clusters
- Ensure total is reasonable (< 100 per technology for most models)
7. Balance Geographic and Cost Diversity¶
Good balance:
group: [region] # Geographic diversity
bin:
- feature: lcoe
q: 3 # Cost diversity
cluster:
- n_clusters: 1 # Simple representative
Too geographic-heavy (ignores costs):
group: [region, county, class] # Very specific locations
# No cost binning - might select expensive sites
Too cost-heavy (ignores geography):
Troubleshooting¶
Too Many Clusters¶
Problem: 200+ clusters slow down the model
Solutions:
- Reduce number of bins (lower
qorbinsvalue) - Use fewer groups
- Reduce
n_clusters
# Before: 5 regions × 4 bins × 3 clusters = 60
group: [region]
bin:
- feature: lcoe
q: 4
cluster:
- feature: cf
n_clusters: 3
method: agg
# After: 5 regions × 2 bins × 2 clusters = 20
group: [region]
bin:
- feature: lcoe
q: 2
cluster:
- feature: cf
n_clusters: 2
method: agg
All Clusters in One Region¶
Problem: All selected sites in the same area
Solutions:
- Add geographic groups
- Include lat/lon in clustering features
- Use
max_distanceclustering method
# Solution 1: Group by region
group: [region]
cluster:
- feature: cf
n_clusters: 2
method: kmeans
# Solution 2: Geographic features
cluster:
- feature: [latitude, longitude, lcoe]
n_clusters: 3
method: kmeans
# Solution 3: Max-distance
```yaml
cluster:
- feature: [latitude, longitude]
method: max_distance
n_clusters: 5
Expensive Sites Selected¶
Problem: High-cost clusters appear in optimal solution
Solutions:
- Tighten
max_lcoefilter - Add more cost bins
- Check that bins are actually dividing by cost
# Tighter filter
filter:
- feature: lcoe
max: 45 # Was 60
# More cost granularity
bin:
- feature: lcoe
q: 5 # Was 3
Missing Expected Groups¶
Problem: Fewer clusters than expected
Cause: Not all group combinations exist in filtered data
Solution: Check resource data to verify group values exist after filtering
Data Requirements¶
Renewable clustering requires resource group profiles with site-level data.
Required Data Files¶
Located in RESOURCE_GROUP_PROFILES directory:
RESOURCE_GROUP_PROFILES/
├── LandbasedWind_Class1_resource_groups.parquet
├── UtilityPV_Class1_resource_groups.parquet
└── OffshoreWind_Class1_resource_groups.parquet
Required Columns¶
Minimum:
region: Model region namelatitude,longitude: Coordinatescapacity_mw: Site capacitycapacity_factor: Average CFprofile_id: Links to generation profile
Recommended:
lcoe: For filtering and binningclass: Wind/solar resource classcounty: For geographic grouping
Optional:
dist_to_tx_km: Distance to transmissionelevation,slope: Terrain featuresinterconnect_mw: Interconnection cost
Generation Profiles¶
Hourly profiles in same directory:
Each profile_id in resource groups must have a corresponding profile file.
Related Settings¶
- New-Build Resources: Base technology costs
- Resource Tags: GenX resource classifications
- Regions: Model region definitions
- Data Tables: Table configuration
Next Steps¶
- Verify data: Check resource group files have required columns
- Start simple: Filter + bin on LCOE with 2-3 k-means clusters
- Examine results: Look at cluster sizes and locations
- Iterate: Add groups or bins based on model needs
- Optimize: Adjust cluster count for runtime vs. resolution trade-off