First of all, you need to install the Anaconda Prompt (conda). Tou have to follow the instructions on the conda website: https://conda.io/projects/conda/en/latest/user-guide/install/windows.html.
Then, you have to install the jupyter notebook. The instructions are available on their website: https://jupyter.org/install.
In a third step, you have to install PyStata: https://www.stata.com/python/pystata/install.html.
In Stata, enter the following command to locate the folder containing Stata 17 : display c(sysdir_stata)
The software is in this folder 'C:\Program Files\Stata17/' in my case and I use the standard edition.
In conda, install PyStata : pip install --upgrade --user stata_setup
Lastly, you have to install the matplotlib library for an illustration with a 3D graph: https://matplotlib.org/stable/users/installing.html
In conda, you can type:
python -m pip install -U pip
python -m pip install -U matplotlib
Now, you are ready to launch Stata from the Jupyter Notebook.
import stata_setup
stata_setup.config('C:\Program Files\Stata17/', 'se')
___ ____ ____ ____ ____ ©
/__ / ____/ / ____/ 17.0
___/ / /___/ / /___/ SE—Standard Edition
Statistics and Data Science Copyright 1985-2021 StataCorp LLC
StataCorp
4905 Lakeway Drive
College Station, Texas 77845 USA
800-STATA-PC https://www.stata.com
979-696-4600 stata@stata.com
Stata license: Single-user perpetual
Serial number: 4...4
Licensed to: Jamel Saadaaoui
University of Strasbourg
Notes:
1. Unicode is supported; see help unicode_advice.
2. Maximum number of variables is set to 5,000; see help set_maxvar.
%%stata
sysuse auto, clear
summarize mpg
.
. sysuse auto, clear
(1978 automobile data)
.
. summarize mpg
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
mpg | 74 21.2973 5.785503 12 41
.
%stata scatter mpg price
import pandas as pd
import io
import requests
data = requests.get("https://www.stata.com/python/pystata/misc/nhanes2.csv").content
nhanes2 = pd.read_csv(io.StringIO(data.decode("utf-8")))
%%stata -d nhanes2
sum
.
. sum
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
sampl | 10,351 33625.57 18412.28 1400 64709
strata | 10,351 16.6667 9.497287 1 32
psu | 10,351 1.481886 .4996959 1 2
region | 0
smsa | 10,351 2.655975 1.282432 1 4
-------------+---------------------------------------------------------
location | 10,351 33.0655 18.41254 1 64
houssiz | 10,351 2.943774 1.695156 1 14
sex | 0
race | 0
age | 10,351 47.57965 17.21483 20 74
-------------+---------------------------------------------------------
height | 10,351 167.6509 9.655916 135.5 200
weight | 10,351 71.89752 15.35642 30.84 175.88
bpsystol | 10,351 130.8817 23.33265 65 300
bpdiast | 10,351 81.715 12.92722 35 150
tcresult | 10,351 217.6697 49.38694 80 828
-------------+---------------------------------------------------------
tgresult | 5,050 143.8958 96.49629 16 2238
hdresult | 8,720 49.64266 14.31176 15 187
hgb | 10,351 14.26044 1.384677 6.9 20.2
hct | 10,351 41.98648 3.67368 20.2 60.7
tibc | 10,351 366.986 55.64079 157 792
-------------+---------------------------------------------------------
iron | 10,351 99.44595 34.08279 16 321
hlthstat | 10,349 2.593487 1.221695 1 8
heartatk | 10,349 .0459948 .2094839 0 1
diabetes | 10,349 .0482172 .2142353 0 1
sizplace | 10,351 5.165588 2.660758 1 8
-------------+---------------------------------------------------------
finalwgt | 10,351 11318.47 7304.04 2000 79634
leadwt | 10,351 11283.84 15011.21 0 81601
corpuscl | 10,262 89.96747 5.52525 58.3 125.9
trnsfern | 10,351 27.60282 10.04152 3.1 94.3
albumin | 10,016 4.669159 .331095 3 5.8
-------------+---------------------------------------------------------
vitaminc | 9,973 1.034814 .5813791 .1 18.1
zinc | 9,202 86.50739 14.47822 43 240
copper | 9,131 125.6094 32.52205 37 346
porphyrn | 10,270 53.67429 25.71968 20 1307
lead | 4,948 14.32033 6.166468 2 80
-------------+---------------------------------------------------------
female | 10,351 .5251667 .4993904 0 1
black | 10,351 .1049174 .3064618 0 1
orace | 10,351 .0193218 .1376601 0 1
fhtatk | 5,434 .0290762 .1680356 0 1
hsizgp | 10,351 2.790938 1.332086 1 5
-------------+---------------------------------------------------------
hsiz1 | 10,351 .167037 .3730269 0 1
hsiz2 | 10,351 .3506908 .4772093 0 1
hsiz3 | 10,351 .1682929 .3741443 0 1
hsiz4 | 10,351 .1522558 .359286 0 1
hsiz5 | 10,351 .1617235 .3682148 0 1
-------------+---------------------------------------------------------
region1 | 10,351 .2024925 .4018767 0 1
region2 | 10,351 .2679934 .4429356 0 1
region3 | 10,351 .2756255 .4468505 0 1
region4 | 10,351 .2538885 .4352556 0 1
smsa1 | 10,351 .2542749 .4354739 0 1
-------------+---------------------------------------------------------
smsa2 | 10,351 .2905999 .4540612 0 1
smsa3 | 10,351 .4551251 .4980062 0 1
rural | 10,351 .3674041 .4821211 0 1
loglead | 4,948 2.577758 .4115249 .6931472 4.382027
agegrp | 0
-------------+---------------------------------------------------------
highlead | 0
bmi | 10,351 25.5376 4.914969 12.3856 61.1297
highbp | 10,351 .4227611 .494022 0 1
.
%%stata
logistic highbp c.age##c.weight
*ereturn list
.
. logistic highbp c.age##c.weight
Logistic regression Number of obs = 10,351
LR chi2(3) = 2381.23
Prob > chi2 = 0.0000
Log likelihood = -5860.1512 Pseudo R2 = 0.1689
------------------------------------------------------------------------------
highbp | Odds ratio Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | 1.108531 .0080697 14.15 0.000 1.092827 1.12446
weight | 1.081505 .005516 15.36 0.000 1.070748 1.092371
|
c.age#|
c.weight | .9992788 .0000977 -7.38 0.000 .9990873 .9994703
|
_cons | .0002025 .0000787 -21.89 0.000 .0000946 .0004335
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
.
. *ereturn list
.
%%stata
quietly margins, at(age=(20(10)80))
marginsplot
. . quietly margins, at(age=(20(10)80)) . . marginsplot Variables that uniquely identify margins: age .
%%stata -doutd preddata
quietly margins, at(age=(20(5)80) weight=(40(5)180)) saving(predictions, replace)
use predictions, clear
list _at1 _at2 _margin in 1/5
rename _at1 age
rename _at2 weight
rename _margin pr_highbp
.
. quietly margins, at(age=(20(5)80) weight=(40(5)180)) saving(predictions, repl
> ace)
.
. use predictions, clear
(Created by command margins; also see char list)
.
. list _at1 _at2 _margin in 1/5
+------------------------+
| _at1 _at2 _margin |
|------------------------|
1. | 20 40 .0200911 |
2. | 20 45 .0274497 |
3. | 20 50 .0374008 |
4. | 20 55 .0507709 |
5. | 20 60 .0685801 |
+------------------------+
.
. rename _at1 age
.
. rename _at2 weight
.
. rename _margin pr_highbp
.
import matplotlib.pyplot as plt
import numpy as np
# define the axes
fig = plt.figure(1, figsize=(10, 8))
ax = plt.axes(projection='3d')
# plot
ax.plot_trisurf(preddata['age'], preddata['weight'], preddata['pr_highbp'],cmap=plt.cm.Spectral_r)
# set ticks and labels for x, y, and z axes
ax.set_xticks(np.arange(20, 90, step=10))
ax.set_yticks(np.arange(40, 200, step=40))
ax.set_zticks(np.arange( 0, 1.2, step=0.2))
ax.set_xlabel("Age (years)")
ax.set_ylabel("Weight (kg)")
ax.set_zlabel("Probability of Hypertension")
# adjust the view angle
ax.view_init(elev=30, azim=240)
# show the plot
plt.show()