Size and Power using Stata: a Simple Example

During my seminars of Applied Econometric with Stata in the Faculty of Economics and Management at the University of Strasbourg, I always tell to my students that the concept of size and power are very important in statistics and in econometrics. Indeed, size (Type I error) and power (1-Type II error) analyses are often used to evaluate the statistical properties for a new test or for a newly proposed estimator. The size and power will be affected by the sample size, but also by the effect size. The objective for the econometricians is to achieve a high power and a small size with a few observations. But, can we observe a trade-off between size and power in small samples?

I asked the AI research assistant, if it can define a power analysis:

In the context of hypothesis testing, power and size are important concepts related to the performance of the test. Specifically, for a two-sample t-test, which compares the means of two groups to determine whether there is a significant difference between them, these concepts can be defined as follows:

Power: Power is the probability that a test will correctly reject the null hypothesis when the alternative hypothesis is true. In other words, it measures the ability of the test to detect an actual effect (i.e., a true difference between the two groups). A higher power means that the test is more likely to identify a significant difference when one exists. The power of a test is influenced by factors such as sample size, effect size (the magnitude of the difference between the groups), and the significance level.

Size (or Significance Level): The size of a test, commonly denoted by alpha (α), is the probability of rejecting the null hypothesis when it is true. This is also known as a Type I error, or a false positive. The significance level is a threshold that determines how strong the evidence against the null hypothesis must be before it can be rejected. Commonly used significance levels are 0.05 and 0.01, which correspond to a 5% and 1% chance of making a Type I error, respectively.

For a two-sample t-test, the goal is to strike a balance between the power and size of the test. Ideally, you want to have high power to detect true effects and a low significance level to minimize the chances of making Type I errors. Increasing the sample size is one way to achieve higher power without increasing the risk of Type I errors.

I will come back on the code that produces this last figure, but we clearly see that from 200 observations, the test achieves a reasonable power of 80 percent.

Now, we can look at the Stata code produced by the AI (I made a few iterations with ChatGPT 3.5 and 4):

cls
clear

* Set up parameters for power analysis
local alpha = 0.05   // significance level
local effect_size = 0.3 // expected effect size

* Set up range of sample sizes
local min_n = 10
local max_n = 1000
local step_n = 5

* Calculate the number of steps
local num_steps = round((`max_n' - `min_n') / `step_n' + 1)

* Create dataset with sample sizes
clear
set obs `num_steps'
gen sample_size = .
gen power = .
local i = 1

* Define a program to calculate power for a given sample size
capture program drop calc_power
program calc_power, rclass
    args n alpha effect_size
    sampsi 0.0 `effect_size', sd(1.0) n(`n') alpha(`alpha')
    return scalar power = r(power)
end

* Calculate power for each sample size, store and display the results
forvalues n = `min_n'(`step_n')`max_n' {
    calc_power `n' `alpha' `effect_size'
    local power = r(power)
    display "Sample size: " `n' " Power: " `power'
    replace sample_size = `n' in `i'
    replace power = `power' in `i'
    local i = `i' + 1
}

Since we have a t-test for two sample means, we want to detect that the means are different. We build on the Stata command sampsi and we write a program that will help us compute the power. Afterward, we post the power and the sample size into variables to draw the figure.

Before looking at the ‘program’ part code, a small piece of advice: do not run the code line by line when you use the local command (it will produce an r(198) syntax error). Now, it could be quite pedagogical to explain the ‘program’ part of the code:

/*

capture program drop calc_power: This line ensures that if a program named calc_power already exists, it will be removed before the new program is defined. capture prevents Stata from displaying an error message if the program does not exist.

program calc_power, rclass: This line starts the definition of a new program named calc_power. The rclass option specifies that the program will return results in the r() return list, which can be accessed after the program is executed.

args n alpha effect_size: This line specifies that the program takes three arguments: n (sample size), alpha (significance level), and effect_size (expected effect size). These arguments will be passed to the program when it's called.
sampsi 0.0 effect_size', sd(1.0) n(n') alpha(alpha'): This line calls Stata's built-in sampsi` command to perform a sample size and power analysis. The command calculates the power for a two-sample t-test with the given effect size, sample size, and significance level, assuming a standard deviation of 1.0 for both groups.

return scalar power = r(power): This line stores the calculated power from the sampsi command in a scalar named power. The r(power) refers to the power value that is stored in the r() return list after running the sampsi command. By assigning the value to a scalar in the return list, it can be easily accessed outside the program.

end: This line marks the end of the calc_power program definition.

*/

Now, I can try to use the previous code to draw the power for different sizes (Type I error). Unsurprisingly, we will observe that there is a trade-off between size and power (1-Type II error). For less than 300 observations, we need to accept a loss of power if we want a smaller size in order to detect a difference between the mean in each of the sample of 0.3. If we have 100 observations, we want a size = 0.01, we need to accept that the power will be only of 30%.

Suppose that the effect size (here, the difference between the sample means) is now smaller, say 0.15, what would be the result?

The trade-off is even worse, you need to accept a very low power (below 10%) if you want to have the same size.

Now, we increase the effect size to 0.45. What would be the result?

Now, the trade-off between size and power has disappeared.

Conclusion: if you are using a small sample and you intend to detect small effects, you have to be careful about the trade-off between size and power.

cls
clear

* Set up parameters for power analysis
local alphas = "0.01 0.025 0.05 0.1 0.2" // different significance levels
local effect_size = 0.45 // expected effect size

* Set up range of sample sizes
local min_n = 10
local max_n = 500
local step_n = 5

* Calculate the number of steps
local num_steps = round((`max_n' - `min_n') / `step_n' + 1)

* Define a program to calculate power for a given sample size and alpha
capture program drop calc_power
program calc_power, rclass
    args n alpha effect_size
    sampsi 0.0 `effect_size', sd(1.0) n(`n') alpha(`alpha')
    return scalar power = r(power)
end

* Create a dataset with sample sizes
clear
set obs `num_steps'
gen sample_size = .

* Generate variables for power at different alpha levels
foreach a in `alphas' {
    local a_underscored: subinstr local a "." "_", all
    gen power_alpha_`a_underscored' = .
}


* Loop through each alpha value and calculate power for each sample size
foreach a in `alphas' {
    local a_underscored: subinstr local a "." "_", all
    local i = 1
    forvalues n = `min_n'(`step_n')`max_n' {
        calc_power `n' `a' `effect_size'
        local power = r(power)
        display "Alpha: " `a' " Sample size: " `n' " Power: " `power'
        replace sample_size = `n' in `i'
        replace power_alpha_`a_underscored' = `power' in `i'
        local i = `i' + 1
    }
}


* Generate a scatter plot of power vs sample size for all alphas
twoway (scatter power_alpha_0_01 sample_size, msymbol(O) msize(small) mcolor(black)) ///
       (scatter power_alpha_0_025 sample_size, msymbol(O) msize(small) mcolor(blue)) ///
       (scatter power_alpha_0_05 sample_size, msymbol(O) msize(small) mcolor(red)) ///
       (scatter power_alpha_0_1 sample_size, msymbol(O) msize(small) mcolor(green)) ///
       (scatter power_alpha_0_2 sample_size, msymbol(O) msize(small) mcolor(orange)), ///
        xscale(range(`min_n' `max_n')) ylabel(0(0.1)1) ///
        xtitle("Sample Size") ytitle("Power") ///
        title("Power Analysis") ///
        legend(order(1 "Alpha = 0.01" 2 "Alpha = 0.025" 3 "Alpha = 0.05" ///
                    4 "Alpha = 0.1" 5 "Alpha = 0.2")) ///
        graphregion(color(white)) plotregion(color(white))


* Save the graph as a PNG file
graph export "power_analysis_plot_combined.png", replace

save power, replace

Please note that the files for replicating this blog are (or will be) available here: https://github.com/JamelSaadaoui/EconMacroBlog

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.