Working with JSON

JSON is the language of the modern web and the format that apps, APIs, and configuration files speak. In this notebook, we’ll learn to read, navigate, and transform real JSON data exported from a golf shot-tracking app.

What You’ll Learn

What JSON is and how it maps to Python types
JSON vs CSV — when to use which
json.loads() / json.dumps() for strings, json.load() / json.dump() for files
Loading and exploring a real shot-tracking export
Navigating nested data structures
Splitting shots into holes using boundary indices
Converting Apple epoch timestamps to readable dates
Calculating distances between GPS coordinates (Haversine formula)
Building a hole-by-hole round summary
Writing processed results back to JSON

Concept: What is JSON?

JSON (JavaScript Object Notation) is a text-based data format. It looks almost exactly like Python dictionaries and lists — because both were influenced by the same ideas.

Here’s how JSON types map to Python:

JSON	Python	Example
object	`dict`	`{"club": "Driver", "distance": 245}`
array	`list`	`[4, 3, 5, 4]`
string	`str`	`"North Park"`
number (int)	`int`	`72`
number (float)	`float`	`40.606`
true / false	`bool`	`true` → `True`
null	`None`	`null` → `None`

That’s it — six types. JSON is intentionally simple, which is why it became the universal data exchange format.

Concept: JSON vs CSV

You’ll encounter both formats constantly. Here’s when to use each:

	CSV	JSON
Structure	Flat, tabular (rows and columns)	Nested, hierarchical (objects inside objects)
Best for	Spreadsheet-like data (scores, stats)	Complex records (API responses, app exports)
Human readable?	Yes, opens in Excel	Yes, but harder for large files
Nested data?	No — must flatten it	Yes — native support
Example	Leaderboard, scoring history	Shot-tracking export, course layout

Rule of thumb: If your data fits naturally in a spreadsheet, use CSV. If it has structure within structure (a round contains holes, each hole contains shots, each shot has a club and coordinates), use JSON.

Real-world golf data is often JSON because a single round contains nested objects — courses with holes, holes with shots, shots with clubs and GPS coordinates. APIs (weather services, golf stat providers, shot-tracking apps) almost always return JSON.

Code: JSON Strings — `json.loads()` and `json.dumps()`

The json module is built into Python. Two functions handle conversion between JSON strings and Python objects:

json.loads(string) — parse a JSON string into Python objects (“load string”)
json.dumps(obj) — convert Python objects into a JSON string (“dump string”)

import json

# A JSON string (imagine this came from an API response)
json_string = '{"club": "Driver", "distance": 245, "fairway_hit": true}'

# Parse it into a Python dict
shot = json.loads(json_string)
print(type(shot))  # <class 'dict'>
print(shot["club"])  # Driver
print(shot["fairway_hit"])  # True (Python bool, not JSON true)

# Convert Python objects back to a JSON string
round_summary = {
    "course": "North Park",
    "score": 42,
    "holes": 9,
    "clubs_used": ["Driver", "6 Iron", "Pitching Wedge", "Putter"],
    "completed": True,
    "weather_notes": None,
}

# Compact format
print(json.dumps(round_summary))

print()

# Pretty-printed (indent=2 is the convention)
print(json.dumps(round_summary, indent=2))

Notice how Python’s True became true, None became null, and single quotes became double quotes. The json module handles all these conversions automatically.

Code: JSON Files — `json.load()` and `json.dump()`

For files, drop the s:

json.load(file) — read a JSON file into Python objects
json.dump(obj, file) — write Python objects to a JSON file

from pathlib import Path

DATA_DIR = Path("../../data")

# Load the shot-tracking app export
with open(DATA_DIR / "shot-tag-round.json") as f:
    round_data = json.load(f)

print(type(round_data))
print(f"Top-level keys: {list(round_data.keys())}")

Code: Exploring the Structure

Real-world JSON is nested. The first thing to do with any JSON file is understand its shape — what keys exist, what types the values are, and how deep the nesting goes.

# What's at the top level?
for key, value in round_data.items():
    if isinstance(value, list):
        print(f"{key}: list of {len(value)} items")
    elif isinstance(value, dict):
        print(f"{key}: dict with keys {list(value.keys())}")
    else:
        print(f"{key}: {type(value).__name__} = {value}")

# Course name — simple string access
print(f"Course: {round_data['courseName']}")

# Total number of shots
shots = round_data["shots"]
print(f"Total shots recorded: {len(shots)}")

# Hole boundaries — tells us which shot indices start each hole
boundaries = round_data["holeBoundaries"]
print(f"Hole boundaries: {boundaries}")
print(f"Number of holes: {len(boundaries)}")

# Look at the first shot — it's a nested dict
first_shot = shots[0]
print(json.dumps(first_shot, indent=2))

# Navigate nested data: club name from a shot
print(f"First shot club: {shots[0]['club']['name']}")
print(f"First shot lat:  {shots[0]['coordinate']['latitude']}")
print(f"First shot lon:  {shots[0]['coordinate']['longitude']}")

Code: Extracting Unique Clubs

Let’s find every club used during the round. This requires navigating into each shot’s nested club dict.

# Extract unique club names
club_names = set()
for shot in shots:
    club_names.add(shot["club"]["name"])

print(f"Clubs used ({len(club_names)}):")
for name in sorted(club_names):
    print(f"  {name}")

# Count how many times each club was used
club_counts = {}
for shot in shots:
    name = shot["club"]["name"]
    club_counts[name] = club_counts.get(name, 0) + 1

# Sort by frequency (most used first)
for club, count in sorted(club_counts.items(), key=lambda x: x[1], reverse=True):
    print(f"  {club:20s} {count} shots")

Code: Splitting Shots into Holes

The holeBoundaries array tells us the starting shot index for each hole. For example, [0, 5, 9, 13, ...] means: - Hole 1: shots at indices 0, 1, 2, 3, 4 (5 shots) - Hole 2: shots at indices 5, 6, 7, 8 (4 shots) - Hole 3: shots at indices 9, 10, 11, 12 (4 shots) - …and so on

We need to pair each boundary with the next one to get the slice for each hole.

boundaries = round_data["holeBoundaries"]
shots = round_data["shots"]

# Split shots into holes
holes = []
for i in range(len(boundaries)):
    start = boundaries[i]
    # End is the next boundary, or the total shot count for the last hole
    end = boundaries[i + 1] if i + 1 < len(boundaries) else len(shots)
    hole_shots = shots[start:end]
    holes.append(hole_shots)

# Verify: shots per hole (this is the score for each hole)
for i, hole in enumerate(holes, start=1):
    # Check for penalty strokes
    penalties = sum(s.get("penaltyStrokeCount", 0) for s in hole)
    score = len(hole) + penalties
    penalty_str = f" (includes {penalties} penalty)" if penalties else ""
    print(f"Hole {i}: {score} shots{penalty_str}")

total = sum(len(h) + sum(s.get("penaltyStrokeCount", 0) for s in h) for h in holes)
print(f"\nTotal: {total}")

Code: Converting Apple Epoch Timestamps

The timestamps in this data use the Apple epoch — seconds since January 1, 2001 (not the Unix epoch of January 1, 1970). The difference between the two epochs is exactly 978,307,200 seconds.

To convert: 1. Add the offset to get a Unix timestamp 2. Use datetime.fromtimestamp() to convert to a readable date

from datetime import datetime, timezone

# Apple epoch offset: seconds between 1970-01-01 and 2001-01-01
APPLE_EPOCH_OFFSET = 978_307_200


def apple_epoch_to_datetime(apple_timestamp: float) -> datetime:
    """Convert an Apple epoch timestamp to a Python datetime (UTC)."""
    unix_timestamp = apple_timestamp + APPLE_EPOCH_OFFSET
    return datetime.fromtimestamp(unix_timestamp, tz=timezone.utc)


# Convert the round start and end times
start_time = apple_epoch_to_datetime(round_data["startDate"])
end_time = apple_epoch_to_datetime(round_data["endDate"])
duration = end_time - start_time

print(f"Round started:  {start_time.strftime('%B %d, %Y at %I:%M %p')}")
print(f"Round ended:    {end_time.strftime('%B %d, %Y at %I:%M %p')}")
print(f"Duration:       {duration}")

# Time each shot was taken
print("First 5 shots with timestamps:")
for shot in shots[:5]:
    dt = apple_epoch_to_datetime(shot["timestamp"])
    print(f"  {dt.strftime('%I:%M:%S %p')} — {shot['club']['name']}")

Code: Calculating Distances (Haversine Formula)

Each shot has GPS coordinates. To calculate the distance between consecutive shots, we need the Haversine formula, which computes the great-circle distance between two points on a sphere (the Earth).

The formula:

\[ a = \sin^2\left(\frac{\Delta\phi}{2}\right) + \cos(\phi_1) \cdot \cos(\phi_2) \cdot \sin^2\left(\frac{\Delta\lambda}{2}\right) \]

\[ d = 2r \cdot \arctan2\left(\sqrt{a},\ \sqrt{1 - a}\right) \]

Where: - \(\phi\) = latitude in radians, \(\lambda\) = longitude in radians - \(r\) = Earth’s radius (6,371,000 meters) - \(d\) = distance in meters

import math

EARTH_RADIUS_METERS = 6_371_000
METERS_TO_YARDS = 1.09361


def haversine(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
    """Calculate distance in yards between two GPS coordinates."""
    # Convert to radians
    phi1 = math.radians(lat1)
    phi2 = math.radians(lat2)
    delta_phi = math.radians(lat2 - lat1)
    delta_lambda = math.radians(lon2 - lon1)

    # Haversine formula
    a = (
        math.sin(delta_phi / 2) ** 2
        + math.cos(phi1) * math.cos(phi2) * math.sin(delta_lambda / 2) ** 2
    )
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
    distance_meters = EARTH_RADIUS_METERS * c

    return distance_meters * METERS_TO_YARDS


# Test: distance between first two shots (Driver → Pitching Wedge)
shot1 = shots[0]
shot2 = shots[1]
dist = haversine(
    shot1["coordinate"]["latitude"], shot1["coordinate"]["longitude"],
    shot2["coordinate"]["latitude"], shot2["coordinate"]["longitude"],
)
print(f"{shot1['club']['name']} → {shot2['club']['name']}: {dist:.0f} yards")

# Calculate distances for all consecutive shots in the round
print(f"{'#':>3s}  {'Club':20s} {'Distance':>10s}")
print("-" * 37)

for i in range(len(shots) - 1):
    s1 = shots[i]
    s2 = shots[i + 1]
    dist = haversine(
        s1["coordinate"]["latitude"], s1["coordinate"]["longitude"],
        s2["coordinate"]["latitude"], s2["coordinate"]["longitude"],
    )
    print(f"{i + 1:3d}  {s1['club']['name']:20s} {dist:8.0f} yd")

# Last shot has no "next" shot to measure distance to
print(f"{len(shots):3d}  {shots[-1]['club']['name']:20s}        --")

Code: Building a Hole-by-Hole Summary

Let’s combine everything into a structured summary for each hole: shots taken, clubs used, and total distance.

pin_locations = round_data["holePinLocations"]

hole_summaries = []

for hole_num in range(len(holes)):
    hole_shots = holes[hole_num]
    pin = pin_locations[hole_num]

    # Score = number of shots + any penalty strokes
    penalties = sum(s.get("penaltyStrokeCount", 0) for s in hole_shots)
    score = len(hole_shots) + penalties

    # Clubs used (in order)
    clubs = [s["club"]["name"] for s in hole_shots]

    # Distance: sum of distances between consecutive shots within the hole
    total_distance = 0.0
    for i in range(len(hole_shots) - 1):
        s1 = hole_shots[i]
        s2 = hole_shots[i + 1]
        total_distance += haversine(
            s1["coordinate"]["latitude"], s1["coordinate"]["longitude"],
            s2["coordinate"]["latitude"], s2["coordinate"]["longitude"],
        )

    # Distance from last shot to pin
    last_shot = hole_shots[-1]
    dist_to_pin = haversine(
        last_shot["coordinate"]["latitude"], last_shot["coordinate"]["longitude"],
        pin["latitude"], pin["longitude"],
    )
    total_distance += dist_to_pin

    # Time taken
    start = apple_epoch_to_datetime(hole_shots[0]["timestamp"])
    end = apple_epoch_to_datetime(hole_shots[-1]["timestamp"])
    elapsed_minutes = (end - start).total_seconds() / 60

    summary = {
        "hole": hole_num + 1,
        "score": score,
        "penalties": penalties,
        "clubs": clubs,
        "total_distance_yards": round(total_distance),
        "elapsed_minutes": round(elapsed_minutes, 1),
    }
    hole_summaries.append(summary)

# Display the summary
print(f"{'Hole':>4s}  {'Score':>5s}  {'Dist':>6s}  {'Time':>6s}  Clubs")
print("-" * 70)

for h in hole_summaries:
    clubs_str = " → ".join(h["clubs"])
    print(
        f"{h['hole']:4d}  {h['score']:5d}  {h['total_distance_yards']:5d}y  "
        f"{h['elapsed_minutes']:5.1f}m  {clubs_str}"
    )

total_score = sum(h["score"] for h in hole_summaries)
total_dist = sum(h["total_distance_yards"] for h in hole_summaries)
print("-" * 70)
print(f"{'OUT':>4s}  {total_score:5d}  {total_dist:5d}y")

Code: Writing Processed Results to JSON

Now let’s save our processed summary as a new JSON file. This is a common pattern: load raw data, process it, write structured results.

# Build the output structure
processed_round = {
    "course": round_data["courseName"],
    "date": start_time.strftime("%Y-%m-%d"),
    "start_time": start_time.isoformat(),
    "end_time": end_time.isoformat(),
    "total_score": total_score,
    "total_shots_recorded": len(shots),
    "holes": hole_summaries,
}

# Preview it
print(json.dumps(processed_round, indent=2))

# Write to file
output_path = DATA_DIR / "shot-tag-round-summary.json"

with open(output_path, "w") as f:
    json.dump(processed_round, f, indent=2)

print(f"Wrote summary to {output_path}")
print(f"File size: {output_path.stat().st_size:,} bytes")

AI: Working with JSON and Real-World Data

Exercise 1: Apple Epoch Conversion

Ask your AI tool:

“Write a Python function that converts Apple epoch timestamps (seconds since January 1, 2001) to datetime objects. Include type hints, a docstring, and a few test examples.”

Evaluate: - Does it use the correct offset (978,307,200 seconds)? - Does it handle timezone awareness? - Compare the result to our apple_epoch_to_datetime() function above. Did the AI produce anything better or different?

Exercise 2: Haversine Formula

Ask your AI tool:

“Implement the Haversine formula in Python to calculate the distance in yards between two GPS coordinates (latitude, longitude). Include the mathematical formula in the docstring.”

Evaluate: - Does it use the correct Earth radius (6,371 km)? - Does it convert degrees to radians? - Test it with our shot data — does it produce the same distances as our implementation? - Check the meters-to-yards conversion factor (1.09361)

Exercise 3: JSON Parser for Scorecard

Give the AI the raw JSON structure and ask it to build a parser:

“I have a JSON file from a golf shot-tracking app. The top-level object has keys: courseName (string), holeBoundaries (list of shot indices marking where each hole starts), holePinLocations (list of {latitude, longitude} for each pin), and shots (list of shot objects with club.name, coordinate.latitude, coordinate.longitude, and optional penaltyStrokeCount). Write a Python function that takes the parsed JSON dict and returns a list of hole dictionaries, each with: hole number, score (including penalties), list of club names used, and the first tee shot club.”

Evaluate: - Does it handle holeBoundaries correctly — pairing each index with the next to slice the shots list? - Does it account for penalty strokes? - Does it handle the last hole correctly (no next boundary — needs to use len(shots))? - Test it on our data and compare to the summary we built above

# Paste and test AI-generated code here

Summary

JSON maps directly to Python: objects → dict, arrays → list, null → None
json.loads() / json.dumps() — convert between JSON strings and Python objects
json.load() / json.dump() — read/write JSON files (drop the s for files)
Nested data is navigated by chaining key access: shot["club"]["name"]
holeBoundaries pattern — use boundary indices to split a flat list into groups
Apple epoch — add 978,307,200 seconds to convert to Unix time, then use datetime
Haversine formula — great-circle distance between GPS coordinates
JSON vs CSV — use JSON for nested/hierarchical data, CSV for flat tabular data
Always explore the structure first (keys(), type(), len()) before diving into the data