First of all, you need to install the Anaconda Prompt (conda). Tou have to follow the instructions on the conda website: https://conda.io/projects/conda/en/latest/user-guide/install/windows.html.
Then, you have to install the jupyter notebook. The instructions are available on their website: https://jupyter.org/install.
In a third step, you have to install PyStata: https://www.stata.com/python/pystata/install.html.
In Stata, enter the following command to locate the folder containing Stata 17 : display c(sysdir_stata)
The software is in this folder 'C:\Program Files\Stata17/' in my case and I use the standard edition.
In conda, install PyStata : pip install --upgrade --user stata_setup
Lastly, you have to install the matplotlib library for an illustration with a 3D graph: https://matplotlib.org/stable/users/installing.html
In conda, you can type:
python -m pip install -U pip
python -m pip install -U matplotlib
Now, you are ready to launch Stata from the Jupyter Notebook.
import stata_setup
stata_setup.config('C:\Program Files\Stata17/', 'se')
___ ____ ____ ____ ____ © /__ / ____/ / ____/ 17.0 ___/ / /___/ / /___/ SE—Standard Edition Statistics and Data Science Copyright 1985-2021 StataCorp LLC StataCorp 4905 Lakeway Drive College Station, Texas 77845 USA 800-STATA-PC https://www.stata.com 979-696-4600 stata@stata.com Stata license: Single-user perpetual Serial number: 4...4 Licensed to: Jamel Saadaaoui University of Strasbourg Notes: 1. Unicode is supported; see help unicode_advice. 2. Maximum number of variables is set to 5,000; see help set_maxvar.
%%stata
sysuse auto, clear
summarize mpg
. . sysuse auto, clear (1978 automobile data) . . summarize mpg Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- mpg | 74 21.2973 5.785503 12 41 .
%stata scatter mpg price
import pandas as pd
import io
import requests
data = requests.get("https://www.stata.com/python/pystata/misc/nhanes2.csv").content
nhanes2 = pd.read_csv(io.StringIO(data.decode("utf-8")))
%%stata -d nhanes2
sum
. . sum Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- sampl | 10,351 33625.57 18412.28 1400 64709 strata | 10,351 16.6667 9.497287 1 32 psu | 10,351 1.481886 .4996959 1 2 region | 0 smsa | 10,351 2.655975 1.282432 1 4 -------------+--------------------------------------------------------- location | 10,351 33.0655 18.41254 1 64 houssiz | 10,351 2.943774 1.695156 1 14 sex | 0 race | 0 age | 10,351 47.57965 17.21483 20 74 -------------+--------------------------------------------------------- height | 10,351 167.6509 9.655916 135.5 200 weight | 10,351 71.89752 15.35642 30.84 175.88 bpsystol | 10,351 130.8817 23.33265 65 300 bpdiast | 10,351 81.715 12.92722 35 150 tcresult | 10,351 217.6697 49.38694 80 828 -------------+--------------------------------------------------------- tgresult | 5,050 143.8958 96.49629 16 2238 hdresult | 8,720 49.64266 14.31176 15 187 hgb | 10,351 14.26044 1.384677 6.9 20.2 hct | 10,351 41.98648 3.67368 20.2 60.7 tibc | 10,351 366.986 55.64079 157 792 -------------+--------------------------------------------------------- iron | 10,351 99.44595 34.08279 16 321 hlthstat | 10,349 2.593487 1.221695 1 8 heartatk | 10,349 .0459948 .2094839 0 1 diabetes | 10,349 .0482172 .2142353 0 1 sizplace | 10,351 5.165588 2.660758 1 8 -------------+--------------------------------------------------------- finalwgt | 10,351 11318.47 7304.04 2000 79634 leadwt | 10,351 11283.84 15011.21 0 81601 corpuscl | 10,262 89.96747 5.52525 58.3 125.9 trnsfern | 10,351 27.60282 10.04152 3.1 94.3 albumin | 10,016 4.669159 .331095 3 5.8 -------------+--------------------------------------------------------- vitaminc | 9,973 1.034814 .5813791 .1 18.1 zinc | 9,202 86.50739 14.47822 43 240 copper | 9,131 125.6094 32.52205 37 346 porphyrn | 10,270 53.67429 25.71968 20 1307 lead | 4,948 14.32033 6.166468 2 80 -------------+--------------------------------------------------------- female | 10,351 .5251667 .4993904 0 1 black | 10,351 .1049174 .3064618 0 1 orace | 10,351 .0193218 .1376601 0 1 fhtatk | 5,434 .0290762 .1680356 0 1 hsizgp | 10,351 2.790938 1.332086 1 5 -------------+--------------------------------------------------------- hsiz1 | 10,351 .167037 .3730269 0 1 hsiz2 | 10,351 .3506908 .4772093 0 1 hsiz3 | 10,351 .1682929 .3741443 0 1 hsiz4 | 10,351 .1522558 .359286 0 1 hsiz5 | 10,351 .1617235 .3682148 0 1 -------------+--------------------------------------------------------- region1 | 10,351 .2024925 .4018767 0 1 region2 | 10,351 .2679934 .4429356 0 1 region3 | 10,351 .2756255 .4468505 0 1 region4 | 10,351 .2538885 .4352556 0 1 smsa1 | 10,351 .2542749 .4354739 0 1 -------------+--------------------------------------------------------- smsa2 | 10,351 .2905999 .4540612 0 1 smsa3 | 10,351 .4551251 .4980062 0 1 rural | 10,351 .3674041 .4821211 0 1 loglead | 4,948 2.577758 .4115249 .6931472 4.382027 agegrp | 0 -------------+--------------------------------------------------------- highlead | 0 bmi | 10,351 25.5376 4.914969 12.3856 61.1297 highbp | 10,351 .4227611 .494022 0 1 .
%%stata
logistic highbp c.age##c.weight
*ereturn list
. . logistic highbp c.age##c.weight Logistic regression Number of obs = 10,351 LR chi2(3) = 2381.23 Prob > chi2 = 0.0000 Log likelihood = -5860.1512 Pseudo R2 = 0.1689 ------------------------------------------------------------------------------ highbp | Odds ratio Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- age | 1.108531 .0080697 14.15 0.000 1.092827 1.12446 weight | 1.081505 .005516 15.36 0.000 1.070748 1.092371 | c.age#| c.weight | .9992788 .0000977 -7.38 0.000 .9990873 .9994703 | _cons | .0002025 .0000787 -21.89 0.000 .0000946 .0004335 ------------------------------------------------------------------------------ Note: _cons estimates baseline odds. . . *ereturn list .
%%stata
quietly margins, at(age=(20(10)80))
marginsplot
. . quietly margins, at(age=(20(10)80)) . . marginsplot Variables that uniquely identify margins: age .
%%stata -doutd preddata
quietly margins, at(age=(20(5)80) weight=(40(5)180)) saving(predictions, replace)
use predictions, clear
list _at1 _at2 _margin in 1/5
rename _at1 age
rename _at2 weight
rename _margin pr_highbp
. . quietly margins, at(age=(20(5)80) weight=(40(5)180)) saving(predictions, repl > ace) . . use predictions, clear (Created by command margins; also see char list) . . list _at1 _at2 _margin in 1/5 +------------------------+ | _at1 _at2 _margin | |------------------------| 1. | 20 40 .0200911 | 2. | 20 45 .0274497 | 3. | 20 50 .0374008 | 4. | 20 55 .0507709 | 5. | 20 60 .0685801 | +------------------------+ . . rename _at1 age . . rename _at2 weight . . rename _margin pr_highbp .
import matplotlib.pyplot as plt
import numpy as np
# define the axes
fig = plt.figure(1, figsize=(10, 8))
ax = plt.axes(projection='3d')
# plot
ax.plot_trisurf(preddata['age'], preddata['weight'], preddata['pr_highbp'],cmap=plt.cm.Spectral_r)
# set ticks and labels for x, y, and z axes
ax.set_xticks(np.arange(20, 90, step=10))
ax.set_yticks(np.arange(40, 200, step=40))
ax.set_zticks(np.arange( 0, 1.2, step=0.2))
ax.set_xlabel("Age (years)")
ax.set_ylabel("Weight (kg)")
ax.set_zlabel("Probability of Hypertension")
# adjust the view angle
ax.view_init(elev=30, azim=240)
# show the plot
plt.show()