First step: Installations¶

First of all, you need to install the Anaconda Prompt (conda). Tou have to follow the instructions on the conda website: https://conda.io/projects/conda/en/latest/user-guide/install/windows.html.

Then, you have to install the jupyter notebook. The instructions are available on their website: https://jupyter.org/install.

In a third step, you have to install PyStata: https://www.stata.com/python/pystata/install.html.

In Stata, enter the following command to locate the folder containing Stata 18 : display c(sysdir_stata)

The software is in this folder 'C:\Program Files\Stata18/' in my case and I use the standard edition.

In conda, install PyStata : pip install --upgrade --user stata_setup

Lastly, you have to install the matplotlib library for an illustration with a 3D graph: https://matplotlib.org/stable/users/installing.html

In conda, you can type:

python -m pip install -U pip

python -m pip install -U matplotlib

Now, you are ready to launch Stata from the Jupyter Notebook.

Second step: Lauching Stata from the Jupyter Notebook¶

In [1]:
import stata_setup
stata_setup.config('C:\Program Files\Stata18/', 'se')
  ___  ____  ____  ____  ____ ®
 /__    /   ____/   /   ____/      18.0
___/   /   /___/   /   /___/       SE—Standard Edition

 Statistics and Data Science       Copyright 1985-2023 StataCorp LLC
                                   StataCorp
                                   4905 Lakeway Drive
                                   College Station, Texas 77845 USA
                                   800-STATA-PC        https://www.stata.com
                                   979-696-4600        stata@stata.com

Stata license: Single-user  perpetual
Serial number: 401806202105
  Licensed to: Jamel Saadaoui
               University of Strasbourg

Notes:
      1. Unicode is supported; see help unicode_advice.
      2. Maximum number of variables is set to 5,000 but can be increased;
          see help set_maxvar.
In [2]:
%%stata

sysuse auto, clear

summarize mpg

summarize
. 
. sysuse auto, clear
(1978 automobile data)

. 
. summarize mpg

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         mpg |         74     21.2973    5.785503         12         41

. 
. summarize

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        make |          0
       price |         74    6165.257    2949.496       3291      15906
         mpg |         74     21.2973    5.785503         12         41
       rep78 |         69    3.405797    .9899323          1          5
    headroom |         74    2.993243    .8459948        1.5          5
-------------+---------------------------------------------------------
       trunk |         74    13.75676    4.277404          5         23
      weight |         74    3019.459    777.1936       1760       4840
      length |         74    187.9324    22.26634        142        233
        turn |         74    39.64865    4.399354         31         51
displacement |         74    197.2973    91.83722         79        425
-------------+---------------------------------------------------------
  gear_ratio |         74    3.014865    .4562871       2.19       3.89
     foreign |         74    .2972973    .4601885          0          1

. 
In [3]:
%%stata

*twoway (histogram mpg, color(red%30))
. 
. *twoway (histogram mpg, color(red%30))
. 
In [4]:
import pandas as pd

import io

import requests
In [5]:
data = requests.get("https://www.jamelsaadaoui.com/wp-content/uploads/2023/05/nhanes2.csv").content

nhanes2 = pd.read_csv(io.StringIO(data.decode("utf-8")))
In [6]:
%%stata -d nhanes2

sum
. 
. sum

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       sampl |     10,351    33625.57    18412.28       1400      64709
      strata |     10,351     16.6667    9.497287          1         32
         psu |          0
      region |          0
        smsa |          0
-------------+---------------------------------------------------------
    location |     10,351     33.0655    18.41254          1         64
     houssiz |     10,351    2.943774    1.695156          1         14
         sex |          0
        race |          0
         age |     10,351    47.57965    17.21483         20         74
-------------+---------------------------------------------------------
      height |     10,351    167.6509    9.655916      135.5        200
      weight |     10,351    71.89752    15.35642      30.84     175.88
    bpsystol |     10,351    130.8817    23.33265         65        300
     bpdiast |     10,351      81.715    12.92722         35        150
    tcresult |     10,351    217.6697    49.38694         80        828
-------------+---------------------------------------------------------
    tgresult |      5,050    143.8958    96.49629         16       2238
    hdresult |      8,720    49.64266    14.31176         15        187
         hgb |     10,351    14.26044    1.384677        6.9       20.2
         hct |     10,351    41.98648     3.67368       20.2       60.7
        tibc |     10,351     366.986    55.64079        157        792
-------------+---------------------------------------------------------
        iron |     10,351    99.44595    34.08279         16        321
    hlthstat |          0
    heartatk |          0
    diabetes |          0
    sizplace |          0
-------------+---------------------------------------------------------
    finalwgt |     10,351    11318.47     7304.04       2000      79634
      leadwt |     10,351    11283.84    15011.21          0      81601
    corpuscl |     10,262    89.96747     5.52525       58.3      125.9
    trnsfern |     10,351    27.60282    10.04152        3.1       94.3
     albumin |     10,016    4.669159     .331095          3        5.8
-------------+---------------------------------------------------------
    vitaminc |      9,973    1.034814    .5813791         .1       18.1
        zinc |      9,202    86.50739    14.47822         43        240
      copper |      9,131    125.6094    32.52205         37        346
    porphyrn |     10,270    53.67429    25.71968         20       1307
        lead |      4,948    14.32033    6.166468          2         80
-------------+---------------------------------------------------------
      female |          0
       black |          0
       orace |          0
      fhtatk |          0
      hsizgp |     10,351    2.790938    1.332086          1          5
-------------+---------------------------------------------------------
       hsiz1 |     10,351     .167037    .3730269          0          1
       hsiz2 |     10,351    .3506908    .4772093          0          1
       hsiz3 |     10,351    .1682929    .3741443          0          1
       hsiz4 |     10,351    .1522558     .359286          0          1
       hsiz5 |     10,351    .1617235    .3682148          0          1
-------------+---------------------------------------------------------
     region1 |     10,351    .2024925    .4018767          0          1
     region2 |     10,351    .2679934    .4429356          0          1
     region3 |     10,351    .2756255    .4468505          0          1
     region4 |     10,351    .2538885    .4352556          0          1
       smsa1 |     10,351    .2542749    .4354739          0          1
-------------+---------------------------------------------------------
       smsa2 |     10,351    .2905999    .4540612          0          1
       smsa3 |     10,351    .4551251    .4980062          0          1
       rural |          0
     loglead |      4,948    2.577758    .4115249   .6931472   4.382027
      agegrp |          0
-------------+---------------------------------------------------------
    highlead |          0
         bmi |     10,351     25.5376    4.914969    12.3856    61.1297
      highbp |     10,351    .4227611     .494022          0          1

. 
In [7]:
%%stata

logistic highbp c.age##c.weight

*ereturn list
. 
. logistic highbp c.age##c.weight

Logistic regression                                    Number of obs =  10,351
                                                       LR chi2(3)    = 2381.23
                                                       Prob > chi2   =  0.0000
Log likelihood = -5860.1512                            Pseudo R2     =  0.1689

------------------------------------------------------------------------------
      highbp | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   1.108531   .0080697    14.15   0.000     1.092827     1.12446
      weight |   1.081505    .005516    15.36   0.000     1.070748    1.092371
             |
       c.age#|
    c.weight |   .9992788   .0000977    -7.38   0.000     .9990873    .9994703
             |
       _cons |   .0002025   .0000787   -21.89   0.000     .0000946    .0004335
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

. 
. *ereturn list
. 
In [8]:
%%stata -eret myeret
logistic highbp c.age##c.weight
Logistic regression                                    Number of obs =  10,351
                                                       LR chi2(3)    = 2381.23
                                                       Prob > chi2   =  0.0000
Log likelihood = -5860.1512                            Pseudo R2     =  0.1689

------------------------------------------------------------------------------
      highbp | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   1.108531   .0080697    14.15   0.000     1.092827     1.12446
      weight |   1.081505    .005516    15.36   0.000     1.070748    1.092371
             |
       c.age#|
    c.weight |   .9992788   .0000977    -7.38   0.000     .9990873    .9994703
             |
       _cons |   .0002025   .0000787   -21.89   0.000     .0000946    .0004335
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.
In [9]:
myeret
Out[9]:
{'e(rank)': 4.0,
 'e(N)': 10351.0,
 'e(ic)': 4.0,
 'e(k)': 4.0,
 'e(k_eq)': 1.0,
 'e(noconstant)': 0.0,
 'e(consonly)': 0.0,
 'e(k_dv)': 1.0,
 'e(converged)': 1.0,
 'e(rc)': 0.0,
 'e(k_autoCns)': 0.0,
 'e(ll)': -5860.151215434775,
 'e(k_eq_model)': 1.0,
 'e(ll_0)': -7050.765484416371,
 'e(df_m)': 3.0,
 'e(chi2)': 2381.228537963192,
 'e(p)': 0.0,
 'e(N_cdf)': 0.0,
 'e(N_cds)': 0.0,
 'e(r2_p)': 0.1688631215451849,
 'e(cmdline)': 'logistic highbp c.age##c.weight',
 'e(cmd)': 'logistic',
 'e(predict)': 'logistic_p',
 'e(estat_cmd)': 'logit_estat',
 'e(marginsderiv)': 'default Pr',
 'e(marginsok)': 'default Pr',
 'e(marginsnotok)': 'stdp DBeta DEviance DX2 DDeviance Hat Number Residuals RStandard SCore',
 'e(title)': 'Logistic regression',
 'e(chi2type)': 'LR',
 'e(deriv_useminbound)': 'off',
 'e(opt)': 'moptimize',
 'e(vce)': 'oim',
 'e(user)': 'mopt__logit_d2()',
 'e(crittype)': 'log likelihood',
 'e(ml_method)': 'd2',
 'e(singularHmethod)': 'm-marquardt',
 'e(technique)': 'nr',
 'e(which)': 'max',
 'e(depvar)': 'highbp',
 'e(properties)': 'b V',
 'e(b)': array([[ 1.03035512e-01,  7.83537336e-02, -7.21492364e-04,
         -8.50485074e+00]]),
 'e(V)': array([[ 5.29930775e-05,  3.50509320e-05, -6.97861007e-07,
         -2.69423165e-03],
        [ 3.50509320e-05,  2.60132665e-05, -4.74084055e-07,
         -1.94299577e-03],
        [-6.97861007e-07, -4.74084055e-07,  9.55811843e-09,
          3.50377702e-05],
        [-2.69423165e-03, -1.94299577e-03,  3.50377702e-05,
          1.50887843e-01]]),
 'e(mns)': array([[4.75796541e+01, 7.18975210e+01, 3.43111878e+03, 1.00000000e+00]]),
 'e(rules)': array([[0., 0., 0., 0.]]),
 'e(ilog)': array([[-7050.76548442, -5886.45946281, -5860.19509926, -5860.15121566,
         -5860.15121543,     0.        ,     0.        ,     0.        ,
             0.        ,     0.        ,     0.        ,     0.        ,
             0.        ,     0.        ,     0.        ,     0.        ,
             0.        ,     0.        ,     0.        ,     0.        ]]),
 'e(gradient)': array([[5.49611647e-06, 8.04060803e-06, 5.24302277e-04, 8.32616448e-08]])}
In [10]:
e_v = myeret['e(V)']
e_v
Out[10]:
array([[ 5.29930775e-05,  3.50509320e-05, -6.97861007e-07,
        -2.69423165e-03],
       [ 3.50509320e-05,  2.60132665e-05, -4.74084055e-07,
        -1.94299577e-03],
       [-6.97861007e-07, -4.74084055e-07,  9.55811843e-09,
         3.50377702e-05],
       [-2.69423165e-03, -1.94299577e-03,  3.50377702e-05,
         1.50887843e-01]])
In [11]:
%%stata

quietly margins, at(age=(20(10)80))

*marginsplot
. 
. quietly margins, at(age=(20(10)80))

. 
. *marginsplot
. 
In [12]:
%%stata -doutd preddata

quietly margins, at(age=(20(5)80) weight=(40(5)180)) saving(predictions, replace)

use predictions, clear

list _at1 _at2 _margin in 1/5

rename _at1 age

rename _at2 weight

rename _margin pr_highbp
. 
. quietly margins, at(age=(20(5)80) weight=(40(5)180)) saving(predictions, repl
> ace)

. 
. use predictions, clear
(Created by command margins; also see char list)

. 
. list _at1 _at2 _margin in 1/5

     +------------------------+
     | _at1   _at2    _margin |
     |------------------------|
  1. |   20     40   .0200911 |
  2. |   20     45   .0274497 |
  3. |   20     50   .0374008 |
  4. |   20     55   .0507709 |
  5. |   20     60   .0685801 |
     +------------------------+

. 
. rename _at1 age

. 
. rename _at2 weight

. 
. rename _margin pr_highbp

. 
In [13]:
import matplotlib.pyplot as plt

import numpy as np

 

# define the axes

fig = plt.figure(1, figsize=(10, 8))

ax = plt.axes(projection='3d')

 

# plot

ax.plot_trisurf(preddata['age'], preddata['weight'], preddata['pr_highbp'],cmap=plt.cm.Spectral_r)

 

# set ticks and labels for x, y, and z axes

ax.set_xticks(np.arange(20, 90, step=10))

ax.set_yticks(np.arange(40, 200, step=40))

ax.set_zticks(np.arange( 0, 1.2, step=0.2))

ax.set_xlabel("Age (years)")

ax.set_ylabel("Weight (kg)")

ax.set_zlabel("Probability of Hypertension")

 

# adjust the view angle

ax.view_init(elev=30, azim=240)

 

# show the plot

plt.show()