BigQuery with GDELT

After two blogs on how to start with GDELT data, I will show how to construct a histogram of bilateral Goldstein score for bilateral events between the US and China since the 1980:

Starting with GDELT data

API with GDELT data at the regional level

After few attempts with ChatGPT 4o, I produced a code to download the data in Google BigQuery:

-- Définir les paramètres de début et de fin depuis 1980
DECLARE start_date DATE DEFAULT DATE '1980-01-01';
DECLARE end_date DATE DEFAULT CURRENT_DATE();

-- Requête pour calculer les moyennes mensuelles des actions des US envers la Chine en excluant certains codes CAMEO
WITH combined_events AS (
  SELECT
    PARSE_DATE('%Y%m%d', CAST(SQLDATE AS STRING)) AS event_date,
    GoldsteinScale,
    NumMentions,
    AvgTone,
    EventCode,
    ActionGeo_FullName,
    ActionGeo_CountryCode,
    ActionGeo_Lat,
    ActionGeo_Long,
    SOURCEURL
  FROM
    `gdelt-bq.full.events`
  WHERE
    Actor1Name = 'UNITED STATES'
    AND Actor2Name = 'CHINA'
    AND PARSE_DATE('%Y%m%d', CAST(SQLDATE AS STRING)) BETWEEN start_date AND end_date
    AND EventCode NOT IN ('0211', '0231', '0254', '0311', '0331', '0354', '054', '071', '085', '110', '1031', '1054', '1211', '1221', '1244', '1621')
  UNION ALL
  SELECT
    PARSE_DATE('%Y%m%d', CAST(SQLDATE AS STRING)) AS event_date,
    GoldsteinScale,
    NumMentions,
    AvgTone,
    EventCode,
    ActionGeo_FullName,
    ActionGeo_CountryCode,
    ActionGeo_Lat,
    ActionGeo_Long,
    SOURCEURL
  FROM
    `gdelt-bq.gdeltv2.events`
  WHERE
    Actor1Name = 'UNITED STATES'
    AND Actor2Name = 'CHINA'
    AND PARSE_DATE('%Y%m%d', CAST(SQLDATE AS STRING)) BETWEEN start_date AND end_date
    AND EventCode NOT IN ('0211', '0231', '0254', '0311', '0331', '0354', '054', '071', '085', '110', '1031', '1054', '1211', '1221', '1244', '1621')
)
SELECT
  FORMAT_TIMESTAMP('%Y-%m', TIMESTAMP(event_date)) AS event_month,
  AVG(CASE WHEN GoldsteinScale > 0 THEN GoldsteinScale ELSE NULL END) AS avg_positive_goldstein_scale,
  AVG(CASE WHEN GoldsteinScale < 0 THEN GoldsteinScale ELSE NULL END) AS avg_negative_goldstein_scale,
  AVG(NumMentions) AS avg_num_mentions,
  AVG(AvgTone) AS avg_tone,
  ARRAY_AGG(DISTINCT EventCode) AS event_codes,
  ANY_VALUE(ActionGeo_FullName) AS location_name,
  ANY_VALUE(ActionGeo_CountryCode) AS country_code,
  ANY_VALUE(ActionGeo_Lat) AS latitude,
  ANY_VALUE(ActionGeo_Long) AS longitude,
  ANY_VALUE(SOURCEURL) AS source_url
FROM
  combined_events
GROUP BY
  event_month
ORDER BY
  event_month;

In Google Colab, use the following code to generate an Excel file:

# Convert BigQuery DataFrame to pandas DataFrame.
pandas_df = bq_df.to_pandas()

# Process the DataFrame
# Convert the 'event_month' column to datetime
bq_df['event_month'] = pd.to_datetime(bq_df['event_month'], errors='coerce')

# Sort the DataFrame by 'event_month'
bq_df = bq_df.sort_values('event_month')

# Format the 'event_month' column as DD-MM-YYYY
bq_df['event_month'] = bq_df['event_month'].dt.strftime('%d-%m-%Y')

# Save the DataFrame to an Excel file
excel_file_path = 'goldstein_scores_analysis.xlsx'
bq_df.to_excel(excel_file_path, index=False)

# Download the Excel file
from google.colab import files
files.download(excel_file_path)

In future blogs, I will refine the CAMEO event codes. Stay tuned.

Drawing bivariate maps with Stata for the NUTS regions (removing Turkey)

A small update of the following blog if you want to remove Turkey:

API with GDELT data at the regional level

Did you know that you can use an API to display instability scores computed from GDELT data? Below, an example for the French region (first administrative level): https://api.gdeltproject.org/api/…

Kernel regressions with Stata

As some of you may know, I have growing interests in the field of Political Economy. It appeared that the set of explained variables that I studied in…

Panel VAR with StataNow 18.5 (Natural Disasters)

In a previous blog, I have shown how to estimate Panel VAR with the new command xtvar introduced in StataNow. This time, I will use the EM-DAT database…

Countries and periods after a panel estimation with Stata

The Economic Cost of Nationalism

US-China Tensions, US Partisan Conflict and Global Oil Prices: Scapegoating? (Applied Economics Letters)

On Theory in Social Sciences

Birkbeck Finance Research Seminar

Leave a Reply Cancel reply

Related Posts

Countries and periods after a panel estimation with Stata

The Economic Cost of Nationalism

US-China Tensions, US Partisan Conflict and Global Oil Prices: Scapegoating? (Applied Economics Letters)

On Theory in Social Sciences

Birkbeck Finance Research Seminar