Title: | Methods for Smart Meter Data Analysis |
---|---|
Description: | Methods for analysis of energy consumption data (electricity, gas, water) at different data measurement intervals. The package provides feature extraction methods and algorithms to prepare data for data mining and machine learning applications. Deatiled descriptions of the methods and their application can be found in Hopf (2019, ISBN:978-3-86309-669-4) "Predictive Analytics for Energy Efficiency and Energy Retailing" <doi:10.20378/irbo-54833> and Hopf et al. (2016) <doi:10.1007/s12525-018-0290-9> "Enhancing energy efficiency in the residential sector with smart meter data analytics". |
Authors: | Konstantin Hopf [aut, cre]
|
Maintainer: | Konstantin Hopf <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.3 |
Built: | 2025-02-12 04:41:59 UTC |
Source: | https://github.com/cran/SmartMeterAnalytics |
This function is intended to compute features for daily consumption data from electricity, gas, and water consumption time series data.
calc_features_daily_multipleTS( el = NULL, gas = NULL, wa = NULL, rowname = NULL, cor.useNA = "complete.obs" )
calc_features_daily_multipleTS( el = NULL, gas = NULL, wa = NULL, rowname = NULL, cor.useNA = "complete.obs" )
el |
electricity consumption |
gas |
gas consumption |
wa |
water consumption |
rowname |
the name of the consumer (e.g., a household ID in a study database) |
cor.useNA |
an optional character string for the cor function, specifying a method for computing covariances in the presence of missing values. |
a data frame with feature values as columns, named by 'rowname'
Konstantin Hopf [email protected]
Hopf, K. (2019). Predictive Analytics for Energy Efficiency and Energy Retailing (1st ed.). Bamberg: University of Bamberg. https://doi.org/10.20378/irbo-54833
Calculates features from one environmental time-series variable and smart meter data
calc_features_weather(SMD, WEATHER, rowname = NULL)
calc_features_weather(SMD, WEATHER, rowname = NULL)
SMD |
the load trace for one week (vector with 672 or 336 elements) |
WEATHER |
weather observations (e.g. temperature) in 30-minute readings (vector with 336 elements) |
rowname |
the row name of the current data point |
Konstantin Hopf [email protected], Ilya Kozlovslkiy
Hopf, K. (2019). Predictive Analytics for Energy Efficiency and Energy Retailing (1st ed.). Bamberg: University of Bamberg. https://doi.org/10.20378/irbo-54833
Hopf, K., Sodenkamp, M., Kozlovskiy, I., & Staake, T. (2014). Feature extraction and filtering for household classification based on smart electricity meter data. Computer Science-Research and Development, (31) 3, 141–148. https://doi.org/10.1007/s00450-014-0294-4
Hopf, K., Sodenkamp, M., & Staake, T. (2018). Enhancing energy efficiency in the residential sector with smart meter data analytics. Electronic Markets, 28(4). https://doi.org/10.1007/s12525-018-0290-9
Calculates features from 15-min smart meter data
calc_features15_consumption( B, rowname = NULL, featsCoarserGranularity = FALSE, replace_NA_with_defaults = TRUE )
calc_features15_consumption( B, rowname = NULL, featsCoarserGranularity = FALSE, replace_NA_with_defaults = TRUE )
B |
a vector with length 4*24*7 = 672 measurements in one day in seven days a week |
rowname |
the row name of the resulting feature vector |
featsCoarserGranularity |
are the features of finer granularity levels also to be calculated (TRUE/FALSE) |
replace_NA_with_defaults |
replaces missing (NA) or infinite values that may appear during calculation with default values |
a data.frame with the calculated features as columns and a specified rowname, if given
Konstantin Hopf [email protected]
Hopf, K. (2019). Predictive Analytics for Energy Efficiency and Energy Retailing (1st ed.). Bamberg: University of Bamberg. https://doi.org/10.20378/irbo-54833
Hopf, K., Sodenkamp, M., Kozlovskiy, I., & Staake, T. (2014). Feature extraction and filtering for household classification based on smart electricity meter data. Computer Science-Research and Development, (31) 3, 141–148. https://doi.org/10.1007/s00450-014-0294-4
Hopf, K., Sodenkamp, M., & Staake, T. (2018). Enhancing energy efficiency in the residential sector with smart meter data analytics. Electronic Markets, 28(4). https://doi.org/10.1007/s12525-018-0290-9
# Create a random time series of 15-minute smart meter data (672 measurements per week) smd <- runif(n=672, min=0, max=2) # Calculate the smart meter data features calc_features15_consumption(smd)
# Create a random time series of 15-minute smart meter data (672 measurements per week) smd <- runif(n=672, min=0, max=2) # Calculate the smart meter data features calc_features15_consumption(smd)
Calculates features from 30-min smart meter data
calc_features30_consumption( B, rowname = NULL, featsCoarserGranularity = FALSE, replace_NA_with_defaults = TRUE )
calc_features30_consumption( B, rowname = NULL, featsCoarserGranularity = FALSE, replace_NA_with_defaults = TRUE )
B |
a vector with length 2*24*7 = 336 measurements in one day in seven days a week |
rowname |
the row name of the resulting feature vector |
featsCoarserGranularity |
are the features of finer granularity levels also to be calculated (TRUE/FALSE) |
replace_NA_with_defaults |
replaces missing (NA) or infinite values that may appear during calculation with default values |
a data.frame with the calculated features as columns and a specified rowname, if given
Konstantin Hopf [email protected]
Hopf, K. (2019). Predictive Analytics for Energy Efficiency and Energy Retailing (1st ed.). Bamberg: University of Bamberg. https://doi.org/10.20378/irbo-54833
Hopf, K., Sodenkamp, M., Kozlovskiy, I., & Staake, T. (2014). Feature extraction and filtering for household classification based on smart electricity meter data. Computer Science-Research and Development, (31) 3, 141–148. https://doi.org/10.1007/s00450-014-0294-4
Hopf, K., Sodenkamp, M., & Staake, T. (2018). Enhancing energy efficiency in the residential sector with smart meter data analytics. Electronic Markets, 28(4). https://doi.org/10.1007/s12525-018-0290-9
Beckel, C., Sadamori, L., Staake, T., & Santini, S. (2014). Revealing household characteristics from smart meter data. Energy, 78, 397–410. https://doi.org/10.1016/j.energy.2014.10.025
# Create a random time series of 30-minute smart meter data (336 measurements per week) smd <- runif(n=336, min=0, max=2) # Calculate the smart meter data features calc_features30_consumption(smd)
# Create a random time series of 30-minute smart meter data (336 measurements per week) smd <- runif(n=336, min=0, max=2) # Calculate the smart meter data features calc_features30_consumption(smd)
Calculates features from 15-min smart meter data
calc_features60_consumption(B, rowname = NULL, replace_NA_with_defaults = TRUE)
calc_features60_consumption(B, rowname = NULL, replace_NA_with_defaults = TRUE)
B |
a vector with length 24*7 = 168 measurements in one day in seven days a week |
rowname |
the row name of the resulting feature vector |
replace_NA_with_defaults |
replaces missing (NA) or infinite values that may appear during calculation with default values |
a data.frame with the calculated features as columns and a specified rowname, if given the row name of the resulting feature vector
Konstantin Hopf [email protected]
# Create a random time series of 60-minute smart meter data (168 measurements per week) smd <- runif(n=168, min=0, max=2) # Calculate the smart meter data features calc_features60_consumption(smd)
# Create a random time series of 60-minute smart meter data (168 measurements per week) smd <- runif(n=168, min=0, max=2) # Calculate the smart meter data features calc_features60_consumption(smd)
Calculates consumption features from weekly consumption only
calc_featuresco_consumption(B, rowname = NULL)
calc_featuresco_consumption(B, rowname = NULL)
B |
a vector of any length with measurements |
rowname |
the row name of the resulting feature vector |
a data.frame with the calculated features as columns and a specified rowname, if given
Konstantin Hopf [email protected]
Hopf, K. (2019). Predictive Analytics for Energy Efficiency and Energy Retailing (1st ed.). Bamberg: University of Bamberg. https://doi.org/10.20378/irbo-54833
Hopf, K., Sodenkamp, M., Kozlovskiy, I., & Staake, T. (2014). Feature extraction and filtering for household classification based on smart electricity meter data. Computer Science-Research and Development, (31) 3, 141–148. https://doi.org/10.1007/s00450-014-0294-4
Hopf, K., Sodenkamp, M., & Staake, T. (2018). Enhancing energy efficiency in the residential sector with smart meter data analytics. Electronic Markets, 28(4). https://doi.org/10.1007/s12525-018-0290-9
Calculates consumption features from daily smart meter data
calc_featuresda_consumption( B, rowname = NULL, featsCoarserGranularity = FALSE, replace_NA_with_defaults = TRUE )
calc_featuresda_consumption( B, rowname = NULL, featsCoarserGranularity = FALSE, replace_NA_with_defaults = TRUE )
B |
a vector with length 7 measurements |
rowname |
the row name of the resulting feature vector |
featsCoarserGranularity |
are the features of finer granularity levels also to be calculated (TRUE/FALSE) |
replace_NA_with_defaults |
replaces missing (NA) or infinite values that may appear during calculation with default values |
a data.frame with the calculated features as columns and a specified rowname, if given
Konstantin Hopf [email protected]
Hopf, K. (2019). Predictive Analytics for Energy Efficiency and Energy Retailing (1st ed.). Bamberg: University of Bamberg. https://doi.org/10.20378/irbo-54833
The division in HT / NT is done from the input smart meter data
calc_featureshtnt_consumption2( HTCons, NTCons, rowname = NULL, featsCoarserGranularity = FALSE )
calc_featureshtnt_consumption2( HTCons, NTCons, rowname = NULL, featsCoarserGranularity = FALSE )
HTCons |
a vector with 7 measurements for HT consumption in one week (beginning with monday) |
NTCons |
a vector with 7 measurements for NT consumption in one week (beginning with monday) |
rowname |
the row name of the resulting feature vector |
featsCoarserGranularity |
are the features of finer granularity levels also to be calculated (T/FALSE) |
Konstantin Hopf [email protected]
Hopf, K. (2019). Predictive Analytics for Energy Efficiency and Energy Retailing (1st ed.). Bamberg: University of Bamberg. https://doi.org/10.20378/irbo-54833
The division in HT / NT is done from the input smart meter data
calc_featuresnt_consumption( B, rowname = NULL, featsCoarserGranularity = FALSE, replace_NA_with_defaults = TRUE )
calc_featuresnt_consumption( B, rowname = NULL, featsCoarserGranularity = FALSE, replace_NA_with_defaults = TRUE )
B |
a vector with length 2*24*7 = 336 measurements in one day in seven days a week |
rowname |
the row name of the resulting feature vector |
featsCoarserGranularity |
are the features of finer granularity levels also to be calculated (TRUE/FALSE) |
replace_NA_with_defaults |
an optional boolean argument specifying if missing values will be replaced with standard values (i.e., zero values) |
HT consumption is during the time 07:00-22:00
Konstantin Hopf [email protected]
Hopf, K. (2019). Predictive Analytics for Energy Efficiency and Energy Retailing (1st ed.). Bamberg: University of Bamberg. https://doi.org/10.20378/irbo-54833
Hopf, K., Sodenkamp, M., Kozlovskiy, I., & Staake, T. (2014). Feature extraction and filtering for household classification based on smart electricity meter data. Computer Science-Research and Development, (31) 3, 141–148. https://doi.org/10.1007/s00450-014-0294-4
Hopf, K., Sodenkamp, M., & Staake, T. (2018). Enhancing energy efficiency in the residential sector with smart meter data analytics. Electronic Markets, 28(4). https://doi.org/10.1007/s12525-018-0290-9
'.' for p-value < 0.1, '*' for < 0.05, '**' for < 0.01, '***' for < 0.001
encode_p_val_stars(pval)
encode_p_val_stars(pval)
pval |
the p-value |
character with the encoding
Konstantin Hopf [email protected]
Creates a set of all combinations of features
features_all_subsets(set)
features_all_subsets(set)
set |
vector of available festures that are premutated |
a list of subsets of the input vector
Konstantin Hopf [email protected], Ilya Kozlovskiy
features_all_subsets(c("A", "B", "C"))
features_all_subsets(c("A", "B", "C"))
Example date formats defined by ISO 8601: * Single days are written in yyy-mm-dd (y: year, m: month, d: day); e.g., 2016-07-19 * Weeks are written in yyyy-Www; e.g., 2016-W29
getDay_ISO8601_week( theweek, day = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun") )
getDay_ISO8601_week( theweek, day = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun") )
theweek |
the string with the week name |
day |
the weekday that shall be returned |
The function uses format und as.Date internally and can therefore not handle ISO8601 week formats. Therefore, a workaround is implemented that can lead to suspicious behavior in future versions
the date of the weekday in the given week
Konstantin Hopf [email protected]
According to date formats defined by ISO 8601: * Single days are written in yyy-mm-dd (y: year, m: month, d: day); e.g., 2016-07-19 * Weeks are written in yyyy-WUww; e.g., 2016-WU29 (typically with the first Sunday of the year as day 1 of week 1)
getDay_US_week( theweek, day = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun") )
getDay_US_week( theweek, day = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun") )
theweek |
the string with the week name |
day |
the weekday that shall be returned |
the date of the weekday in the given week
Konstantin Hopf [email protected]
Interpolate missing readings
interpolate_missingReadings(timeseries, option = "linear", ...)
interpolate_missingReadings(timeseries, option = "linear", ...)
timeseries |
Numeric Vector ( |
option |
Algorithm to be used. Accepts the following input: |
... |
Additional parameters to be passed through to approx or spline interpolation functions |
Missing values get replaced by values of a approx, spline or stinterp interpolation.
Vector (vector
) or Time Series (ts
) object (dependent on given input at parameter x)
The implementation is adopted from the package imputeTS, function na.interpolate (https://github.com/SteffenMoritz/imputeTS/blob/master/R/na.interpolation.R)
Cleans up a data.frame or matrix which is useful for cases wehere you need complete datasets
naInf_omit(V)
naInf_omit(V)
V |
A data.frame or matrix which has to be cleaned |
A cleaned version of data.frame or matrix
Konstantin Hopf [email protected]
replaceNAsFeatures, remove_empty_features
Determines two clusters of high and low consumption times (e.g., non-ocupancy during holidays)
occupancy_cluster(consumption, n_days_check = 4, sds_between_clusters = 1.5)
occupancy_cluster(consumption, n_days_check = 4, sds_between_clusters = 1.5)
consumption |
the consumption time series |
n_days_check |
number of consecutive days that should be considered as a minimal cluster |
sds_between_clusters |
the multiples of standatd deviation that must be at least between the cluster centers (decimal number) |
list with cluster assignments and the k-Means clustering model
Konstantin Hopf [email protected]
Returns a vector of feature names that can be calculated by methods in the *SmartMeterAnalytics* package obtains the feature set according
prepareFeatureSet( features.granularity = NA, features.w_adj = FALSE, features.anonymized = FALSE, features.categorical = FALSE, features.geo = "osm-v1", features.temperature = TRUE, features.weather = TRUE, features.neighborhood = FALSE )
prepareFeatureSet( features.granularity = NA, features.w_adj = FALSE, features.anonymized = FALSE, features.categorical = FALSE, features.geo = "osm-v1", features.temperature = TRUE, features.weather = TRUE, features.neighborhood = FALSE )
features.granularity |
Character: The granularity of the input data, either "15-min" (only 15-min features), "30-min" (only 30-minute features), "all_30min_to_week" (all features on daily, weekly, hourly, ..., up to 30-min data), "all_15_week" (all up to 15-min dara), "week" (only the consumption of one week as a single feature). |
features.w_adj |
Boolean: are the features to be weather adjusted with DiD-Class (NOT IMPLEMENTED YET!) |
features.anonymized |
Boolean: are anonymized geographic features used (NOT IMPLEMENTED YET!) |
features.categorical |
Boolean: use categorical features additionally (if only numeric features are used) |
features.geo |
Character: Version of the geographic feature set (either "none", "osm-v1", "osm-v2") |
features.temperature |
Boolean, if features for the temperature should be included |
features.weather |
Boolean, if other weather features should be included |
features.neighborhood |
Boolean, if features for the neighborhood should be included |
Character vector
Konstantin Hopf [email protected]
Hopf, K. (2019). Predictive Analytics for Energy Efficiency and Energy Retailing (1st ed.). Bamberg: University of Bamberg. https://doi.org/10.20378/irbo-54833
Hopf, K., Sodenkamp, M., Kozlovskiy, I., & Staake, T. (2014). Feature extraction and filtering for household classification based on smart electricity meter data. Computer Science-Research and Development, (31) 3, 141–148. https://doi.org/10.1007/s00450-014-0294-4
Hopf, K., Sodenkamp, M., & Staake, T. (2018). Enhancing energy efficiency in the residential sector with smart meter data analytics. Electronic Markets, 28(4). https://doi.org/10.1007/s12525-018-0290-9
Beckel, C., Sadamori, L., Staake, T., & Santini, S. (2014). Revealing household characteristics from smart meter data. Energy, 78, 397–410. https://doi.org/10.1016/j.energy.2014.10.025
Removes variable names from a list of variables that contain only, or a large portion of, NA values or have zero bandwidth (if they are numeric) and returns the variable names.
remove_empty_features( all.features, dataset, percentage_NA_allowed = NA, bandwidth = (.Machine$double.eps^0.5), verbose = FALSE )
remove_empty_features( all.features, dataset, percentage_NA_allowed = NA, bandwidth = (.Machine$double.eps^0.5), verbose = FALSE )
all.features |
a character vector with all column names of |
dataset |
the dataset as a data.frame |
percentage_NA_allowed |
the percentage of missing values per vector that should be allowed without removing the feature. All features with NA values that are higher than this level are excluded. |
bandwidth |
The length of the interval that values of variable must exceed to be not
removed. By default, half of |
verbose |
boolean if debug messages should be printed when a variable is removed from the list (uses futile.logger package) |
The function checks all given column names for the portion of NA values.
If the number of NA of Inf exceeds percentage_NA_allowed
,
the column name is removed from the variable set. Besides, all numeric
variables are checked if they have almost zero bandwidth
, are removed.
a vector of variable names that are not considered as empty
Konstantin Hopf [email protected]
naInf_omit, replaceNAsFeatures
Taks a data.frame and replaces all NA values with a certain value.
replaceNAsFeatures(indata, features, replacement = 0)
replaceNAsFeatures(indata, features, replacement = 0)
indata |
|
features |
a vector of variable names (must be colum names of |
replacement |
the alternative value, NA values should be replaced with, zero by default |
the modified data.frame with replaced values
Konstantin Hopf [email protected]
naInf_omit, remove_empty_features
Performs oversampling by creating new instances.
smote( Variables, Classes, subset_use = NULL, k = 5, use_nearest = TRUE, proportions = 0.9, equalise_with_undersampling = FALSE, safe = FALSE )
smote( Variables, Classes, subset_use = NULL, k = 5, use_nearest = TRUE, proportions = 0.9, equalise_with_undersampling = FALSE, safe = FALSE )
Variables |
the data.frame of independent variables that should be used to create new instances |
Classes |
the class labels in the prediction problem |
subset_use |
a specific subset only is used for the oversampling. If NULL, everything is used. |
k |
the number of neigbours for generation |
use_nearest |
should only the nearest neighbours be used? (very slow) |
proportions |
to which proportion (of the biggest class) should the classes be equalized |
equalise_with_undersampling |
should additional undersampling be performed? |
safe |
should a safe version of SMOTE be used? |
SMOTE is used to generate synthetic datapoints of a smaller class, for example to overcome the problem of imbalanced classes in classification.
a list containing new independent variables data.frame and new class labels
Ilya Kozlovskiy, Konstantin Hopf [email protected]