Data Tables Settings¶

PowerGenome loads data from tables (CSV, Parquet, DuckDB) configured in settings. Table configurations support filtering, column selection, and scenario-based data loading through the DataManager.

Table Configuration Patterns¶

Simple Configuration¶

Point to a filename in data_location:

generation_table: generators.csv
fuel_prices_table: fuel_prices.parquet
demand_table: hourly_demand.csv

Files are loaded from the folder specified in data_location.

Advanced Configuration¶

Use dictionary format for filtering and column selection:

generation_table:
  table_name: generators.parquet
  scenario: high_retirements
  filters:
    - - [operating_year, '<=', 2030]
      - [retirement_year, '>', 2030]
  columns: [plant_id, technology, capacity_mw, heat_rate, region]

Parameters:

table_name: Filename or database table
scenario: Quick filter for scenario column
filters: DNF (Disjunctive Normal Form) filter logic
columns: Limit columns loaded (for large files)

Filter Syntax¶

DNF (Disjunctive Normal Form)¶

Filters use nested lists representing (A AND B) OR (C AND D) logic:

filters:
  - - [column1, operator, value1]
    - [column2, operator, value2]  # AND with above
  - - [column3, operator, value3]  # OR with first group

Operators:

= or ==: Equality
!=: Not equal
>: Greater than
>=: Greater than or equal
<: Less than
<=: Less than or equal
in: Value in list
not in: Value not in list

Filter Examples¶

Simple filter (single condition):

filters:
  - - [year, '=', 2030]

AND filter (both conditions must be true):

filters:
  - - [year, '>=', 2030]
    - [region, '=', 'CA_N']

OR filter (either condition can be true):

filters:
  - - [region, '=', 'CA_N']
  - - [region, '=', 'CA_S']

Complex filter ((A AND B) OR C):

filters:
  - - [year, '=', 2030]
    - [scenario, '=', 'high']
  - - [year, '=', 2040]

List membership:

filters:
  - - [region, 'in', ['CA_N', 'CA_S', 'AZ']]
    - [technology, 'not in', ['Coal', 'Oil']]

Scenario Convenience Filter¶

The scenario parameter is shorthand for a scenario column filter:

# These are equivalent:
generation_table:
  table_name: generators.parquet
  scenario: baseline

generation_table:
  table_name: generators.parquet
  filters:
    - - [scenario, '=', 'baseline']

Standard Table Mappings¶

PowerGenome expects certain table names for core data:

Setting Parameter	Standard Name	Purpose
`generation_table`	`generation`	Existing generators
`demand_table`	`demand`	Hourly demand profiles
`fuel_prices_table`	`fuel_prices`	Fuel cost time series
`transmission_table`	`transmission`	Network constraints
`plant_region_table`	`plant_region`	Plant location mapping
`capacity_limit_spur_table`	`capacity_limit_spur`	New-build capacity limits
`dg_capacity_table`	`dg_capacity`	Distributed gen capacity
`dg_profiles_table`	`dg_profiles`	Distributed gen profiles

Core Data Tables¶

Generation Table¶

Setting: generation_table Purpose: Existing power plant data

Required columns:

plant_id or plant_id_eia: Unique plant identifier
technology: Technology type
capacity_mw: Nameplate capacity
region: Geographic region

Optional columns:

heat_rate_mmbtu_mwh: Thermal efficiency
operating_year: Year plant started
retirement_year: Planned retirement
fixed_o_m_mw: Fixed O&M costs
variable_o_m_mwh: Variable O&M costs
minimum_load_mw: Minimum stable operation
fuel: Fuel type

Example:

generation_table:
  table_name: generators.parquet
  filters:
    - - [operating_year, '<=', 2030]
      - [capacity_mw, '>', 10]  # Exclude tiny plants

Demand Table¶

Setting: demand_table Purpose: Hourly electricity demand in tidy format

Required columns:

region: Model region name
time_index: Hour index (1 to number of hours)
load_mw: Demand in MW

Optional columns:

year: Model year (for multi-year data)
scenario: Scenario identifier
weather_year: Weather data year

Format: Tidy/long format with one row per region-time observation.

Example:

demand_table:
  table_name: demand_timeseries.parquet
  scenario: high_ev
  filters:
    - - [year, '=', 2030]

Example demand CSV (tidy format):

time_index,weather_year,region,load_mw,year
1,2012,CA_N,15234.5,2030
2,2012,CA_N,14123.2,2030
3,2012,CA_N,13890.4,2030
1,2012,CA_S,12450.8,2030
2,2012,CA_S,11234.5,2030
...

Fuel Prices Table¶

Setting: fuel_prices_table Purpose: Fuel cost projections

Required columns:

fuel: Fuel name (coal, naturalgas, distillate, uranium, etc.)
region: Region name
year: Calendar year
price: Price ($/MMBtu)

Optional columns:

scenario: Price scenario
data_year: Data vintage year
dollar_year: USD year (for inflation adjustment)
month: Monthly prices (if seasonal)

Regional Coverage

The fuel price table must include prices for all base regions in your model. For example, if you have an aggregated region CA composed of base regions CA_N and CA_S, your fuel price table must contain separate rows for CA_N and CA_S. PowerGenome will automatically calculate the average price for CA from its constituent base regions.

Example:

fuel_prices_table:
  table_name: fuel_prices.parquet
  scenario: high_gas

Example fuel_prices.csv:

fuel,region,year,price,scenario,dollar_year
coal,CA_N,2030,2.5,reference,2024
coal,CA_S,2030,2.6,reference,2024
naturalgas,CA_N,2030,4.2,reference,2024
naturalgas,CA_S,2030,4.3,reference,2024
uranium,CA_N,2030,0.8,reference,2024
uranium,CA_S,2030,0.8,reference,2024

Transmission Table¶

Setting: transmission_table Purpose: Inter-regional transmission constraints

Required columns:

transmission_path_name: Line identifier
start_region: Origin region
dest_region: Destination region
transmission_line_mw: Existing capacity

Optional columns:

max_transmission_mw: Maximum capacity
distance_miles: Line length
line_loss_pct: Transmission losses

Example:

transmission_table: transmission_network.csv

Distributed Generation Tables¶

DG Capacity Table¶

Setting: dg_capacity_table Purpose: Distributed generation capacity by region/year

Required columns:

region: Model region
year: Model year
capacity_mw: DG capacity

Optional columns:

scenario: DG adoption scenario
technology: DG technology type

Example:

dg_capacity_table:
  table_name: dg_capacity.parquet
  scenario: high_solar

DG Profiles Table¶

Setting: dg_profiles_table Purpose: Hourly generation profiles for DG

Required columns:

region: Model region
hour: Hour of year
cf: Capacity factor (0-1)

Optional columns:

weather_year: Weather data vintage
technology: DG type

Example:

dg_profiles_table: dg_generation_profiles.parquet

Renewable Resource Tables¶

Resource Group Profiles¶

Setting: RESOURCE_GROUP_PROFILES (path) Purpose: Hourly generation profiles for renewable clusters

File naming: {technology}_profiles_{suffix}.csv or .parquet

Format: Tidy format with columns:

weather_year: Weather data year
time_index: Hour index (1 to number of hours)
site_id: Resource cluster/site identifier
value: Capacity factor (0.0 to 1.0)

Example:

RESOURCE_GROUP_PROFILES: /data/nrel_profiles/

Files in folder:

landbasedwind_profiles_class3.parquet
utilitypv_profiles_class1.parquet
offshorewind_profiles_class1.parquet

Example profile CSV (tidy format):

weather_year,time_index,site_id,value
2012,1,3244,0.45
2012,2,3244,0.42
2012,3,3244,0.48
2012,1,3818,0.52
2012,2,3818,0.48
...

Policy and Cost Tables¶

Emission Policies¶

Setting: emission_policies_fn Purpose: RPS, CES, carbon constraints by case/region/year

Required columns:

case_id: Case identifier
year: Model year
region: Model region (or "all")

Policy columns (example):

RPS: Renewable portfolio standard (fraction)
CES: Clean energy standard (fraction)
CO2_cap: CO2 emissions limit (tonnes)

Example:

emission_policies_fn: emission_policies.csv

Cost Multipliers¶

Setting: cost_multiplier_fn Purpose: Regional construction cost adjustments

Required columns:

region: Region name
technology: Technology name
multiplier: Cost multiplier (1.2 = 20% increase)

Example:

cost_multiplier_fn: regional_cost_multipliers.csv

Capacity Limits and Spur Lines¶

Setting: capacity_limit_spur_fn Purpose: New-build capacity limits and connection costs

Required columns:

region: Model region
technology: Technology name
max_capacity: Maximum capacity (MW)
spur_miles: Spur line distance

Optional columns:

cluster: Resource cluster ID
spur_capex_mw_mile: Spur line cost

Example:

capacity_limit_spur_fn: capacity_limits.csv

Data File Formats¶

CSV Files¶

Standard CSV format with header row:

plant_id,technology,capacity_mw,region
1,Coal,500,CA_N
2,NaturalGas_CC,300,CA_S

Parquet Files¶

Binary columnar format (more efficient for large datasets):

import pandas as pd
df = pd.read_parquet("generators.parquet")

PowerGenome automatically detects and loads Parquet files.

Database Tables¶

Database tables (SQLite or DuckDB format):

data_location: /path/to/database.duckdb

generation_table:
  table_name: existing_generators  # Table name in database

Both SQLite (.db, .sqlite) and DuckDB (.duckdb) databases are supported:

# SQLite database
data_location: /path/to/powergenome_data.db

# DuckDB database
data_location: /path/to/powergenome_data.duckdb

Data Location¶

`data_location`¶

Type: String (path) Required: Yes Example: "/path/to/data"

Root directory or database file for all data tables.

Folder structure:

data_location: /Users/me/powergenome_data

Files loaded from:

/Users/me/powergenome_data/generators.csv
/Users/me/powergenome_data/demand.parquet
/Users/me/powergenome_data/fuel_prices.csv

Database file:

data_location: /Users/me/powergenome_data/model_data.duckdb

Tables loaded from database.

Column Selection¶

Limit columns loaded to reduce memory usage:

generation_table:
  table_name: generators_full.parquet
  columns:
    - plant_id
    - technology
    - capacity_mw
    - heat_rate_mmbtu_mwh
    - region
  filters:
    - - [region, 'in', ['CA_N', 'CA_S']]

Only specified columns are loaded. Useful for large datasets with many unused columns.

Example Configurations¶

Minimal Configuration¶

data_location: /data/powergenome

generation_table: generators.csv
demand_table: demand.parquet
fuel_prices_table: fuel_prices.csv

Advanced Multi-Scenario¶

data_location: /data/powergenome_v2

generation_table:
  table_name: generators_all_scenarios.parquet
  scenario: baseline
  filters:
    - - [operating_year, '<=', 2030]
  columns:
    - plant_id
    - technology
    - capacity_mw
    - heat_rate_mmbtu_mwh
    - region
    - operating_year

demand_table:
  table_name: demand_timeseries.parquet
  scenario: high_ev
  filters:
    - - [year, '=', 2030]
    - - [weather_year, 'in', [2012, 2013, 2014]]

fuel_prices_table:
  table_name: fuel_costs.parquet
  scenario: reference
  filters:
    - - [year, '>=', 2025]
    - - [year, '<=', 2050]

dg_capacity_table:
  table_name: distributed_generation.parquet
  scenario: high_solar
  filters:
    - - [technology, '=', 'rooftop_pv']

transmission_table: transmission_network.csv

Troubleshooting¶

Table Not Found¶

Error: Table 'generation' not found in data_location

Solutions:

Check data_location path is correct
Verify filename matches table_name
Ensure file has correct extension (.csv, .parquet)
For databases, verify table exists with list_tables()

Column Missing¶

Error: Column 'capacity_mw' not found in table

Solutions:

Check column names in source data (case-sensitive)
Verify data schema matches expectations
PowerGenome converts column names to snake_case automatically

Filter Errors¶

Error: Invalid filter syntax

Solutions:

Check filter uses proper DNF nested list format
Verify operators are quoted: '=' not =
Ensure values match column data types (string vs. numeric)

Model Definition: data_location path
Existing Generators: generation_table schema
Demand: demand_table, dg_capacity_table, dg_profiles_table
Fuels: fuel_prices_table
Transmission: transmission_table

Data Tables Settings¶

Table Configuration Patterns¶

Simple Configuration¶

Advanced Configuration¶

Filter Syntax¶

DNF (Disjunctive Normal Form)¶

Filter Examples¶

Scenario Convenience Filter¶

Standard Table Mappings¶

Core Data Tables¶

Generation Table¶

Demand Table¶

Fuel Prices Table¶

Transmission Table¶

Distributed Generation Tables¶

DG Capacity Table¶

DG Profiles Table¶

Renewable Resource Tables¶

Resource Group Profiles¶

Policy and Cost Tables¶

Emission Policies¶

Cost Multipliers¶

Capacity Limits and Spur Lines¶

Data File Formats¶

CSV Files¶

Parquet Files¶

Database Tables¶

Data Location¶

data_location¶

Column Selection¶

Example Configurations¶

Minimal Configuration¶

Advanced Multi-Scenario¶

Troubleshooting¶

Table Not Found¶

Column Missing¶

Filter Errors¶

Related Settings¶

`data_location`¶