Skip to content

Configure Data Tables

PowerGenome reads all input data through a central DataManager that loads tables from a folder of (CSV or Parquet) files or a (DuckDB or Sqlite) database. This guide walks through how to point PowerGenome at your data files and how to filter or select the data you need.

For the full parameter reference, see Data Tables Settings.

Prerequisites

  • Completed the Getting Started tutorial
  • A folder containing your source data files (CSV or Parquet)

Step 1: Set the data location

Tell PowerGenome where your data files live with data_location:

# settings/data.yml
data_location: /path/to/your/data_folder # Can also be /path/to/database.db

This is the base path used for all table lookups. All filenames in table settings are resolved relative to this folder.


Step 2: Configure each table

Simple configuration (filename only)

Point each setting to a file by name:

generation_table: generators.parquet
demand_table: hourly_demand.parquet
fuel_price_table: fuel_prices.csv
transmission_constraints_table: transmission_constraints.csv

PowerGenome resolves each filename inside data_location.

Advanced configuration (with filters)

Use a dictionary when you need to filter rows, select columns, or pick a specific scenario from a multi-scenario file:

generation_table:
  table_name: generators.parquet
  scenario: baseline
  filters:
    - - [capacity_mw, '>', 10]

Keys:

Key Required Description
table_name Yes Filename in data_location
scenario No Filter rows where scenario column equals this value
filters No DNF filter logic (see below)
columns No Load only these columns (useful for large files)

Step 3: Write filter expressions

DNF filter syntax

Filters use Disjunctive Normal Form: a list of AND-groups that are OR'd together.

(A AND B) OR (C AND D)
filters:
  - - [column, operator, value]   # Group 1, condition A
    - [column, operator, value]   # Group 1, condition B  (AND)
  - - [column, operator, value]   # Group 2, condition C  (OR)

Supported operators: =, ==, !=, >, >=, <, <=, in, not in

Common filter patterns

Single condition:

demand_table:
  table_name: demand.parquet
  filters:
    - - [year, '=', 2030]

AND (two conditions must both be true):

generation_table:
  table_name: generators.parquet
  filters:
    - - [operating_year, '<=', 2030]
      - [retirement_year, '>', 2030]

OR (either region is included):

demand_table:
  table_name: demand.parquet
  filters:
    - - [region, '=', 'CA_N']
    - - [region, '=', 'CA_S']

List membership (in):

generation_table:
  table_name: generators.parquet
  filters:
    - - [region, 'in', ['CA_N', 'CA_S', 'AZ']]

Step 4: Handle multi-year data

If your data file spans multiple planning years, use a year filter so each run only sees the rows for that period:

demand_table:
  table_name: demand_all_years.parquet
  filters:
    - - [year, '=', 2030]

For settings that change across planning years, use a year-keyed dictionary:

demand_table:
  2030:
    table_name: demand_all_years.parquet
    filters:
      - - [year, '=', 2030]
  2040:
    table_name: demand_all_years.parquet
    filters:
      - - [year, '=', 2040]

Step 5: Combine scenario with filters

scenario is a shorthand that adds a scenario = <value> filter. You can combine it with additional filters — PowerGenome ANDs the scenario condition into every OR-group:

generation_table:
  table_name: generators.parquet
  scenario: high_retirements
  filters:
    - - [capacity_mw, '>=', 1]

This is equivalent to:

generation_table:
  table_name: generators.parquet
  filters:
    - - [scenario, '=', 'high_retirements']
      - [capacity_mw, '>=', 1]

Step 6: Reduce memory use with column selection

For large Parquet files, load only the columns your model needs:

generation_table:
  table_name: generators.parquet
  columns:
    - plant_id
    - technology
    - capacity_mw
    - heat_rate_mmbtu_mwh
    - region
    - operating_year

Step 7: Use a DuckDB database (optional)

If your data is stored in a database (.db or .duckdb) file rather than a folder of flat files, point data_location at the database file:

data_location: /path/to/powergenome_data.db

Table names then refer to tables inside the database:

generation_table: generators
demand_table: hourly_demand

Do not include a file extension when using a database backend.


Table reference

The following settings parameters map to tables in DataManager:

Setting parameter Standard table name Typical use
generation_table generation Existing power plants
demand_table demand Hourly load time series
fuel_price_table fuel_price Fuel cost time series
transmission_constraints_table transmission_constraints Network topology
plant_region_table plant_region Plant-to-region mapping
distributed_capacity_table distributed_capacity Rooftop solar capacity
distributed_profiles_table distributed_profiles Rooftop solar generation

Troubleshooting

"Table not found" error

Check that the filename is spelled correctly and is present in data_location:

ls /path/to/your/data_folder

Verify the setting key is exact (e.g., generation_table, not generators_table).

Wrong data being loaded

Use list_tables() to confirm what DataManager sees:

from powergenome.database import get_data_manager
dm = get_data_manager()
print(dm.list_tables())

File has no extension

PowerGenome tries to auto-detect .csv or .parquet files by appending extensions, but it's best practice to always include the extension in your setting.