Configure Data Tables¶

PowerGenome reads all input data through a central DataManager that loads tables from a folder of (CSV or Parquet) files or a (DuckDB or Sqlite) database. This guide walks through how to point PowerGenome at your data files and how to filter or select the data you need.

For the full parameter reference, see Data Tables Settings.

Prerequisites¶

Completed the Getting Started tutorial
A folder containing your source data files (CSV or Parquet)

Step 1: Set the data location¶

Tell PowerGenome where your data files live with data_location:

# settings/data.yml
data_location: /path/to/your/data_folder # Can also be /path/to/database.db

This is the base path used for all table lookups. All filenames in table settings are resolved relative to this folder.

Step 2: Configure each table¶

Simple configuration (filename only)¶

Point each setting to a file by name:

generation_table: generators.parquet
demand_table: hourly_demand.parquet
fuel_price_table: fuel_prices.csv
transmission_constraints_table: transmission_constraints.csv

PowerGenome resolves each filename inside data_location.

Advanced configuration (with filters)¶

Use a dictionary when you need to filter rows, select columns, or pick a specific scenario from a multi-scenario file:

generation_table:
  table_name: generators.parquet
  scenario: baseline
  filters:
    - - [capacity_mw, '>', 10]

Keys:

Key	Required	Description
`table_name`	Yes	Filename in `data_location`
`scenario`	No	Filter rows where `scenario` column equals this value
`filters`	No	DNF filter logic (see below)
`columns`	No	Load only these columns (useful for large files)

Step 3: Write filter expressions¶

DNF filter syntax¶

Filters use Disjunctive Normal Form: a list of AND-groups that are OR'd together.

(A AND B) OR (C AND D)

filters:
  - - [column, operator, value]   # Group 1, condition A
    - [column, operator, value]   # Group 1, condition B  (AND)
  - - [column, operator, value]   # Group 2, condition C  (OR)

Supported operators: =, ==, !=, >, >=, <, <=, in, not in

Common filter patterns¶

Single condition:

demand_table:
  table_name: demand.parquet
  filters:
    - - [year, '=', 2030]

AND (two conditions must both be true):

generation_table:
  table_name: generators.parquet
  filters:
    - - [operating_year, '<=', 2030]
      - [retirement_year, '>', 2030]

OR (either region is included):

demand_table:
  table_name: demand.parquet
  filters:
    - - [region, '=', 'CA_N']
    - - [region, '=', 'CA_S']

List membership (in):

generation_table:
  table_name: generators.parquet
  filters:
    - - [region, 'in', ['CA_N', 'CA_S', 'AZ']]

Step 4: Handle multi-year data¶

If your data file spans multiple planning years, use a year filter so each run only sees the rows for that period:

demand_table:
  table_name: demand_all_years.parquet
  filters:
    - - [year, '=', 2030]

For settings that change across planning years, use a year-keyed dictionary:

demand_table:
  2030:
    table_name: demand_all_years.parquet
    filters:
      - - [year, '=', 2030]
  2040:
    table_name: demand_all_years.parquet
    filters:
      - - [year, '=', 2040]

Step 5: Combine `scenario` with filters¶

scenario is a shorthand that adds a scenario = <value> filter. You can combine it with additional filters — PowerGenome ANDs the scenario condition into every OR-group:

generation_table:
  table_name: generators.parquet
  scenario: high_retirements
  filters:
    - - [capacity_mw, '>=', 1]

This is equivalent to:

generation_table:
  table_name: generators.parquet
  filters:
    - - [scenario, '=', 'high_retirements']
      - [capacity_mw, '>=', 1]

Step 6: Reduce memory use with column selection¶

For large Parquet files, load only the columns your model needs:

generation_table:
  table_name: generators.parquet
  columns:
    - plant_id
    - technology
    - capacity_mw
    - heat_rate_mmbtu_mwh
    - region
    - operating_year

Step 7: Use a DuckDB database (optional)¶

If your data is stored in a database (.db or .duckdb) file rather than a folder of flat files, point data_location at the database file:

data_location: /path/to/powergenome_data.db

Table names then refer to tables inside the database:

generation_table: generators
demand_table: hourly_demand

Do not include a file extension when using a database backend.

Table reference¶

The following settings parameters map to tables in DataManager:

Setting parameter	Standard table name	Typical use
`generation_table`	`generation`	Existing power plants
`demand_table`	`demand`	Hourly load time series
`fuel_price_table`	`fuel_price`	Fuel cost time series
`transmission_constraints_table`	`transmission_constraints`	Network topology
`plant_region_table`	`plant_region`	Plant-to-region mapping
`distributed_capacity_table`	`distributed_capacity`	Rooftop solar capacity
`distributed_profiles_table`	`distributed_profiles`	Rooftop solar generation

Troubleshooting¶

"Table not found" error

Check that the filename is spelled correctly and is present in data_location:

ls /path/to/your/data_folder

Verify the setting key is exact (e.g., generation_table, not generators_table).

Wrong data being loaded

Use list_tables() to confirm what DataManager sees:

from powergenome.database import get_data_manager
dm = get_data_manager()
print(dm.list_tables())

File has no extension

PowerGenome tries to auto-detect .csv or .parquet files by appending extensions, but it's best practice to always include the extension in your setting.

Data Tables Settings Reference: Full parameter documentation and filter syntax
Configure Settings: Settings file organization
Run Multi-Scenario Studies: Varying data sources across scenarios