import json
# A JSON string (imagine this came from an API response)
json_string = '{"club": "Driver", "distance": 245, "fairway_hit": true}'
# Parse it into a Python dict
shot = json.loads(json_string)
print(type(shot)) # <class 'dict'>
print(shot["club"]) # Driver
print(shot["fairway_hit"]) # True (Python bool, not JSON true)Working with JSON
JSON is the language of the modern web and the format that apps, APIs, and configuration files speak. In this notebook, we’ll learn to read, navigate, and transform real JSON data exported from a golf shot-tracking app.
What You’ll Learn
- What JSON is and how it maps to Python types
- JSON vs CSV — when to use which
json.loads()/json.dumps()for strings,json.load()/json.dump()for files- Loading and exploring a real shot-tracking export
- Navigating nested data structures
- Splitting shots into holes using boundary indices
- Converting Apple epoch timestamps to readable dates
- Calculating distances between GPS coordinates (Haversine formula)
- Building a hole-by-hole round summary
- Writing processed results back to JSON
Concept: What is JSON?
JSON (JavaScript Object Notation) is a text-based data format. It looks almost exactly like Python dictionaries and lists — because both were influenced by the same ideas.
Here’s how JSON types map to Python:
| JSON | Python | Example |
|---|---|---|
| object | dict |
{"club": "Driver", "distance": 245} |
| array | list |
[4, 3, 5, 4] |
| string | str |
"North Park" |
| number (int) | int |
72 |
| number (float) | float |
40.606 |
| true / false | bool |
true → True |
| null | None |
null → None |
That’s it — six types. JSON is intentionally simple, which is why it became the universal data exchange format.
Concept: JSON vs CSV
You’ll encounter both formats constantly. Here’s when to use each:
| CSV | JSON | |
|---|---|---|
| Structure | Flat, tabular (rows and columns) | Nested, hierarchical (objects inside objects) |
| Best for | Spreadsheet-like data (scores, stats) | Complex records (API responses, app exports) |
| Human readable? | Yes, opens in Excel | Yes, but harder for large files |
| Nested data? | No — must flatten it | Yes — native support |
| Example | Leaderboard, scoring history | Shot-tracking export, course layout |
Rule of thumb: If your data fits naturally in a spreadsheet, use CSV. If it has structure within structure (a round contains holes, each hole contains shots, each shot has a club and coordinates), use JSON.
Real-world golf data is often JSON because a single round contains nested objects — courses with holes, holes with shots, shots with clubs and GPS coordinates. APIs (weather services, golf stat providers, shot-tracking apps) almost always return JSON.
Code: JSON Strings — json.loads() and json.dumps()
The json module is built into Python. Two functions handle conversion between JSON strings and Python objects:
json.loads(string)— parse a JSON string into Python objects (“load string”)json.dumps(obj)— convert Python objects into a JSON string (“dump string”)
# Convert Python objects back to a JSON string
round_summary = {
"course": "North Park",
"score": 42,
"holes": 9,
"clubs_used": ["Driver", "6 Iron", "Pitching Wedge", "Putter"],
"completed": True,
"weather_notes": None,
}
# Compact format
print(json.dumps(round_summary))
print()
# Pretty-printed (indent=2 is the convention)
print(json.dumps(round_summary, indent=2))Notice how Python’s True became true, None became null, and single quotes became double quotes. The json module handles all these conversions automatically.
Code: JSON Files — json.load() and json.dump()
For files, drop the s:
json.load(file)— read a JSON file into Python objectsjson.dump(obj, file)— write Python objects to a JSON file
from pathlib import Path
DATA_DIR = Path("../../data")
# Load the shot-tracking app export
with open(DATA_DIR / "shot-tag-round.json") as f:
round_data = json.load(f)
print(type(round_data))
print(f"Top-level keys: {list(round_data.keys())}")Code: Exploring the Structure
Real-world JSON is nested. The first thing to do with any JSON file is understand its shape — what keys exist, what types the values are, and how deep the nesting goes.
# What's at the top level?
for key, value in round_data.items():
if isinstance(value, list):
print(f"{key}: list of {len(value)} items")
elif isinstance(value, dict):
print(f"{key}: dict with keys {list(value.keys())}")
else:
print(f"{key}: {type(value).__name__} = {value}")# Course name — simple string access
print(f"Course: {round_data['courseName']}")
# Total number of shots
shots = round_data["shots"]
print(f"Total shots recorded: {len(shots)}")
# Hole boundaries — tells us which shot indices start each hole
boundaries = round_data["holeBoundaries"]
print(f"Hole boundaries: {boundaries}")
print(f"Number of holes: {len(boundaries)}")# Look at the first shot — it's a nested dict
first_shot = shots[0]
print(json.dumps(first_shot, indent=2))# Navigate nested data: club name from a shot
print(f"First shot club: {shots[0]['club']['name']}")
print(f"First shot lat: {shots[0]['coordinate']['latitude']}")
print(f"First shot lon: {shots[0]['coordinate']['longitude']}")Code: Extracting Unique Clubs
Let’s find every club used during the round. This requires navigating into each shot’s nested club dict.
# Extract unique club names
club_names = set()
for shot in shots:
club_names.add(shot["club"]["name"])
print(f"Clubs used ({len(club_names)}):")
for name in sorted(club_names):
print(f" {name}")# Count how many times each club was used
club_counts = {}
for shot in shots:
name = shot["club"]["name"]
club_counts[name] = club_counts.get(name, 0) + 1
# Sort by frequency (most used first)
for club, count in sorted(club_counts.items(), key=lambda x: x[1], reverse=True):
print(f" {club:20s} {count} shots")Code: Splitting Shots into Holes
The holeBoundaries array tells us the starting shot index for each hole. For example, [0, 5, 9, 13, ...] means: - Hole 1: shots at indices 0, 1, 2, 3, 4 (5 shots) - Hole 2: shots at indices 5, 6, 7, 8 (4 shots) - Hole 3: shots at indices 9, 10, 11, 12 (4 shots) - …and so on
We need to pair each boundary with the next one to get the slice for each hole.
boundaries = round_data["holeBoundaries"]
shots = round_data["shots"]
# Split shots into holes
holes = []
for i in range(len(boundaries)):
start = boundaries[i]
# End is the next boundary, or the total shot count for the last hole
end = boundaries[i + 1] if i + 1 < len(boundaries) else len(shots)
hole_shots = shots[start:end]
holes.append(hole_shots)
# Verify: shots per hole (this is the score for each hole)
for i, hole in enumerate(holes, start=1):
# Check for penalty strokes
penalties = sum(s.get("penaltyStrokeCount", 0) for s in hole)
score = len(hole) + penalties
penalty_str = f" (includes {penalties} penalty)" if penalties else ""
print(f"Hole {i}: {score} shots{penalty_str}")
total = sum(len(h) + sum(s.get("penaltyStrokeCount", 0) for s in h) for h in holes)
print(f"\nTotal: {total}")Code: Converting Apple Epoch Timestamps
The timestamps in this data use the Apple epoch — seconds since January 1, 2001 (not the Unix epoch of January 1, 1970). The difference between the two epochs is exactly 978,307,200 seconds.
To convert: 1. Add the offset to get a Unix timestamp 2. Use datetime.fromtimestamp() to convert to a readable date
from datetime import datetime, timezone
# Apple epoch offset: seconds between 1970-01-01 and 2001-01-01
APPLE_EPOCH_OFFSET = 978_307_200
def apple_epoch_to_datetime(apple_timestamp: float) -> datetime:
"""Convert an Apple epoch timestamp to a Python datetime (UTC)."""
unix_timestamp = apple_timestamp + APPLE_EPOCH_OFFSET
return datetime.fromtimestamp(unix_timestamp, tz=timezone.utc)
# Convert the round start and end times
start_time = apple_epoch_to_datetime(round_data["startDate"])
end_time = apple_epoch_to_datetime(round_data["endDate"])
duration = end_time - start_time
print(f"Round started: {start_time.strftime('%B %d, %Y at %I:%M %p')}")
print(f"Round ended: {end_time.strftime('%B %d, %Y at %I:%M %p')}")
print(f"Duration: {duration}")# Time each shot was taken
print("First 5 shots with timestamps:")
for shot in shots[:5]:
dt = apple_epoch_to_datetime(shot["timestamp"])
print(f" {dt.strftime('%I:%M:%S %p')} — {shot['club']['name']}")Code: Calculating Distances (Haversine Formula)
Each shot has GPS coordinates. To calculate the distance between consecutive shots, we need the Haversine formula, which computes the great-circle distance between two points on a sphere (the Earth).
The formula:
\[ a = \sin^2\left(\frac{\Delta\phi}{2}\right) + \cos(\phi_1) \cdot \cos(\phi_2) \cdot \sin^2\left(\frac{\Delta\lambda}{2}\right) \]
\[ d = 2r \cdot \arctan2\left(\sqrt{a},\ \sqrt{1 - a}\right) \]
Where: - \(\phi\) = latitude in radians, \(\lambda\) = longitude in radians - \(r\) = Earth’s radius (6,371,000 meters) - \(d\) = distance in meters
import math
EARTH_RADIUS_METERS = 6_371_000
METERS_TO_YARDS = 1.09361
def haversine(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
"""Calculate distance in yards between two GPS coordinates."""
# Convert to radians
phi1 = math.radians(lat1)
phi2 = math.radians(lat2)
delta_phi = math.radians(lat2 - lat1)
delta_lambda = math.radians(lon2 - lon1)
# Haversine formula
a = (
math.sin(delta_phi / 2) ** 2
+ math.cos(phi1) * math.cos(phi2) * math.sin(delta_lambda / 2) ** 2
)
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
distance_meters = EARTH_RADIUS_METERS * c
return distance_meters * METERS_TO_YARDS
# Test: distance between first two shots (Driver → Pitching Wedge)
shot1 = shots[0]
shot2 = shots[1]
dist = haversine(
shot1["coordinate"]["latitude"], shot1["coordinate"]["longitude"],
shot2["coordinate"]["latitude"], shot2["coordinate"]["longitude"],
)
print(f"{shot1['club']['name']} → {shot2['club']['name']}: {dist:.0f} yards")# Calculate distances for all consecutive shots in the round
print(f"{'#':>3s} {'Club':20s} {'Distance':>10s}")
print("-" * 37)
for i in range(len(shots) - 1):
s1 = shots[i]
s2 = shots[i + 1]
dist = haversine(
s1["coordinate"]["latitude"], s1["coordinate"]["longitude"],
s2["coordinate"]["latitude"], s2["coordinate"]["longitude"],
)
print(f"{i + 1:3d} {s1['club']['name']:20s} {dist:8.0f} yd")
# Last shot has no "next" shot to measure distance to
print(f"{len(shots):3d} {shots[-1]['club']['name']:20s} --")Code: Building a Hole-by-Hole Summary
Let’s combine everything into a structured summary for each hole: shots taken, clubs used, and total distance.
pin_locations = round_data["holePinLocations"]
hole_summaries = []
for hole_num in range(len(holes)):
hole_shots = holes[hole_num]
pin = pin_locations[hole_num]
# Score = number of shots + any penalty strokes
penalties = sum(s.get("penaltyStrokeCount", 0) for s in hole_shots)
score = len(hole_shots) + penalties
# Clubs used (in order)
clubs = [s["club"]["name"] for s in hole_shots]
# Distance: sum of distances between consecutive shots within the hole
total_distance = 0.0
for i in range(len(hole_shots) - 1):
s1 = hole_shots[i]
s2 = hole_shots[i + 1]
total_distance += haversine(
s1["coordinate"]["latitude"], s1["coordinate"]["longitude"],
s2["coordinate"]["latitude"], s2["coordinate"]["longitude"],
)
# Distance from last shot to pin
last_shot = hole_shots[-1]
dist_to_pin = haversine(
last_shot["coordinate"]["latitude"], last_shot["coordinate"]["longitude"],
pin["latitude"], pin["longitude"],
)
total_distance += dist_to_pin
# Time taken
start = apple_epoch_to_datetime(hole_shots[0]["timestamp"])
end = apple_epoch_to_datetime(hole_shots[-1]["timestamp"])
elapsed_minutes = (end - start).total_seconds() / 60
summary = {
"hole": hole_num + 1,
"score": score,
"penalties": penalties,
"clubs": clubs,
"total_distance_yards": round(total_distance),
"elapsed_minutes": round(elapsed_minutes, 1),
}
hole_summaries.append(summary)
# Display the summary
print(f"{'Hole':>4s} {'Score':>5s} {'Dist':>6s} {'Time':>6s} Clubs")
print("-" * 70)
for h in hole_summaries:
clubs_str = " → ".join(h["clubs"])
print(
f"{h['hole']:4d} {h['score']:5d} {h['total_distance_yards']:5d}y "
f"{h['elapsed_minutes']:5.1f}m {clubs_str}"
)
total_score = sum(h["score"] for h in hole_summaries)
total_dist = sum(h["total_distance_yards"] for h in hole_summaries)
print("-" * 70)
print(f"{'OUT':>4s} {total_score:5d} {total_dist:5d}y")Code: Writing Processed Results to JSON
Now let’s save our processed summary as a new JSON file. This is a common pattern: load raw data, process it, write structured results.
# Build the output structure
processed_round = {
"course": round_data["courseName"],
"date": start_time.strftime("%Y-%m-%d"),
"start_time": start_time.isoformat(),
"end_time": end_time.isoformat(),
"total_score": total_score,
"total_shots_recorded": len(shots),
"holes": hole_summaries,
}
# Preview it
print(json.dumps(processed_round, indent=2))# Write to file
output_path = DATA_DIR / "shot-tag-round-summary.json"
with open(output_path, "w") as f:
json.dump(processed_round, f, indent=2)
print(f"Wrote summary to {output_path}")
print(f"File size: {output_path.stat().st_size:,} bytes")AI: Working with JSON and Real-World Data
Exercise 1: Apple Epoch Conversion
Ask your AI tool:
“Write a Python function that converts Apple epoch timestamps (seconds since January 1, 2001) to datetime objects. Include type hints, a docstring, and a few test examples.”
Evaluate: - Does it use the correct offset (978,307,200 seconds)? - Does it handle timezone awareness? - Compare the result to our apple_epoch_to_datetime() function above. Did the AI produce anything better or different?
Exercise 2: Haversine Formula
Ask your AI tool:
“Implement the Haversine formula in Python to calculate the distance in yards between two GPS coordinates (latitude, longitude). Include the mathematical formula in the docstring.”
Evaluate: - Does it use the correct Earth radius (6,371 km)? - Does it convert degrees to radians? - Test it with our shot data — does it produce the same distances as our implementation? - Check the meters-to-yards conversion factor (1.09361)
Exercise 3: JSON Parser for Scorecard
Give the AI the raw JSON structure and ask it to build a parser:
“I have a JSON file from a golf shot-tracking app. The top-level object has keys: courseName (string), holeBoundaries (list of shot indices marking where each hole starts), holePinLocations (list of {latitude, longitude} for each pin), and shots (list of shot objects with club.name, coordinate.latitude, coordinate.longitude, and optional penaltyStrokeCount). Write a Python function that takes the parsed JSON dict and returns a list of hole dictionaries, each with: hole number, score (including penalties), list of club names used, and the first tee shot club.”
Evaluate: - Does it handle holeBoundaries correctly — pairing each index with the next to slice the shots list? - Does it account for penalty strokes? - Does it handle the last hole correctly (no next boundary — needs to use len(shots))? - Test it on our data and compare to the summary we built above
# Paste and test AI-generated code hereSummary
- JSON maps directly to Python: objects →
dict, arrays →list, null →None json.loads()/json.dumps()— convert between JSON strings and Python objectsjson.load()/json.dump()— read/write JSON files (drop thesfor files)- Nested data is navigated by chaining key access:
shot["club"]["name"] holeBoundariespattern — use boundary indices to split a flat list into groups- Apple epoch — add 978,307,200 seconds to convert to Unix time, then use
datetime - Haversine formula — great-circle distance between GPS coordinates
- JSON vs CSV — use JSON for nested/hierarchical data, CSV for flat tabular data
- Always explore the structure first (
keys(),type(),len()) before diving into the data