Parquet Schema Reference
The open-source snapshots contain Apache Parquet files organized in two categories: per-country files (one set per country) and global metadata files (shared across all countries).
File layout
Section titled “File layout”release/ country=US/ links.parquet rankings.parquet trending.parquet metadata/core/ media_details.parquet movies.parquet tv_shows.parquet ... metadata/translations/ lang_fr.parquet ...Per-country files
Section titled “Per-country files”links.parquet
Streaming availability links for one country. One row per unique (media, provider, offer) combination. Sorted by `media_id, season_number, episode_number, provider_id, addon_provider_id, link_token` for deterministic output and efficient predicate pushdown.
Sorted by:media_id, season_number, episode_number, provider_id, addon_provider_id, link_token
| Column | Type | Null | Description |
|---|---|---|---|
media_id | UTF8 | — | Internal Popcorn Time media identifier. Alphanumeric string, e.g. "mv00012345" (movie) or "tv00067890" (TV show). |
season_number | INT16 | Yes | Season number for TV show season-level or episode-level links. NULL for movie links and show-level links. |
episode_number | INT16 | Yes | Episode number within the season. NULL for movie, show-level, and season-level links. |
provider_id | UTF8 | — | Streaming provider identifier, e.g. "netflix", "hulu", "disney_plus". Matches the provider registry in providers.json. |
addon_provider_id | UTF8 | Yes | Add-on channel provider identifier (e.g. Showtime on Prime Video). NULL for direct offers. |
link_token | UTF8 | — | AES-256-SIV encrypted, bs58-encoded web URL. Decrypt via go.popcorntime.app/go/{link_token}. Format: bs58([key_version:u8] ++ aes_siv_encrypt(url)). |
available_from | DATE | Yes | Provider-reported availability start date. NULL if not reported. |
available_to | DATE | Yes | Provider-reported availability expiry date. NULL if not reported or no expiry. |
first_seen_at | DATE | Yes | Date the link was first discovered by the spider. NULL for links imported from legacy data. |
price_types | LIST<UTF8> | Yes | Offer price types, sorted alphabetically. Values: flatrate, rent, buy, free, cinema, ads, fast, flatrate_and_buy. |
formats | LIST<UTF8> | Yes | Available video formats, sorted alphabetically. Values: sd, hd, uhd, 4k, 3d. |
audio_languages | LIST<UTF8> | Yes | Available audio language codes (ISO 639-1), sorted alphabetically. Example: `["en","es","fr"]`. |
subtitle_languages | LIST<UTF8> | Yes | Available subtitle language codes (ISO 639-1), sorted alphabetically. |
platforms | LIST<UTF8> | Yes | Platform identifiers that have deep links for this offer, sorted alphabetically. Values: android_tv, fire_tv, ios, roku, webos. Actual URLs are in the private platforms.parquet. |
rankings.parquet
Weekly popularity rankings per country. Denormalized with media fields so the API can serve rankings without joins. Rows are pre-sorted by `position ASC`.
Sorted by:position
| Column | Type | Null | Description |
|---|---|---|---|
media_id | UTF8 | — | Internal Popcorn Time media identifier. |
score | INT32 | — | Composite popularity score used for ranking (higher = more popular). |
position | INT32 | — | Rank position in this country, starting at 1 (1 = most popular). |
points | INT32 | — | Raw point count contributing to the score. |
slug | UTF8 | Yes | URL-safe media slug, e.g. "the-dark-knight-2008". NULL if media record not found. |
title | UTF8 | Yes | Primary display title. NULL if media record not found. |
original_title | UTF8 | Yes | Original-language title. NULL if same as title or not available. |
year | UTF8 | Yes | Release year as a string. NULL if not available. |
content_type | UTF8 | Yes | Content type: "movie" or "tv_show". NULL if not available. |
poster | UTF8 | Yes | Poster image path (TMDB CDN relative path). NULL if not available. |
backdrop | UTF8 | Yes | Backdrop image path (TMDB CDN relative path). NULL if not available. |
popularity | UTF8 | Yes | TMDB popularity score as a string. NULL if not available. |
tmdb_rating | UTF8 | Yes | TMDB average rating as a string (0–10). NULL if not available. |
genres | UTF8 | Yes | JSON array of genre slugs sorted alphabetically, e.g. `["action","drama"]`. NULL if no genres. |
providers | UTF8 | Yes | JSON array of provider IDs available in this country, e.g. `["netflix","hulu"]`. NULL if no providers. |
trending.parquet
Daily trending positions per country. `provider_id = "__all__"` for aggregate trending (not provider-specific). Rows are pre-sorted by `(source, position)`.
Sorted by:source, position
| Column | Type | Null | Description |
|---|---|---|---|
media_id | UTF8 | — | Internal Popcorn Time media identifier. |
provider_id | UTF8 | — | Provider identifier, or "__all__" for aggregate trending across all providers. |
position | INT32 | — | Trending rank position, starting at 1 (1 = most trending). |
source | UTF8 | — | Trending source: "tmdb_day" or "tmdb_week". |
Global metadata
Shared across all countries, under metadata/core/.
media_details.parquet
Core media metadata (movies and TV shows), denormalized. Exported from the local metadata staging database. All columns are nullable UTF8 strings. Sorted by `id` with small row groups (2K rows) for efficient predicate pushdown in the Cloudflare Worker.
Sorted by:id
| Column | Type | Null | Description |
|---|---|---|---|
id | UTF8 | Yes | Internal Popcorn Time media identifier, e.g. "mv00012345" or "tv00067890". |
tmdb_id | UTF8 | Yes | TMDB numeric ID as a string. |
slug | UTF8 | Yes | URL-safe slug, e.g. "the-dark-knight-2008". |
title | UTF8 | Yes | Primary display title. |
original_title | UTF8 | Yes | Original-language title. |
homepage | UTF8 | Yes | Official homepage URL. |
year | UTF8 | Yes | Release year as a string. |
country | UTF8 | Yes | Country of origin (ISO 3166-1 alpha-2). |
budget | UTF8 | Yes | Production budget in USD as a string. |
revenue | UTF8 | Yes | Box office revenue in USD as a string. |
released | UTF8 | Yes | Release date (YYYY-MM-DD). |
content_type | UTF8 | Yes | Content type: "movie" or "tv_show". |
tagline | UTF8 | Yes | Tagline. |
overview | UTF8 | Yes | Plot overview. |
classification | UTF8 | Yes | Age classification, e.g. "PG-13". Country-neutral; see content_ratings for per-country values. |
poster | UTF8 | Yes | Poster image path (TMDB CDN relative path). |
backdrop | UTF8 | Yes | Backdrop image path (TMDB CDN relative path). |
popularity | UTF8 | Yes | TMDB popularity score as a string. |
tmdb_rating | UTF8 | Yes | TMDB average rating as a string (0–10). |
vote_count | UTF8 | Yes | TMDB vote count as a string. |
runtime | UTF8 | Yes | Movie runtime in minutes as a string. NULL for TV shows. |
in_production | UTF8 | Yes | Whether the TV show is still in production. NULL for movies. |
last_air_date | UTF8 | Yes | Last air date for TV shows (YYYY-MM-DD). NULL for movies. |
genres | UTF8 | Yes | JSON array of genre objects: [{"id":28,"slug":"action","name":"Action"},...]. |
ratings | UTF8 | Yes | JSON array of rating objects: [{"source":"imdb","rating":"8.5"},...]. |
external_ids | UTF8 | Yes | JSON array of external ID objects: [{"source":"imdb","id":"tt0468569"},...]. |
videos | UTF8 | Yes | JSON array of video objects: [{"source":"youtube","id":"EXeTwQWrcwY"},...]. |
content_ratings | UTF8 | Yes | JSON object of per-country content ratings: {"US":"PG-13","GB":"12A"}. |
movies.parquet
Movie-specific metadata fields.
Sorted by:id
| Column | Type | Null | Description |
|---|---|---|---|
id | UTF8 | Yes | Internal media ID (join with media_details.parquet). |
runtime | UTF8 | Yes | Runtime in minutes. |
tv_shows.parquet
TV show-specific metadata fields.
Sorted by:id
| Column | Type | Null | Description |
|---|---|---|---|
id | UTF8 | Yes | Internal media ID (join with media_details.parquet). |
in_production | UTF8 | Yes | Whether the show is still producing new episodes. |
last_air_date | UTF8 | Yes | Last episode air date (ISO 8601). |
seasons.parquet
TV show season metadata.
Sorted by:media_id, season_number
| Column | Type | Null | Description |
|---|---|---|---|
media_id | UTF8 | Yes | Parent TV show media ID. |
season_number | UTF8 | Yes | Season number. |
title | UTF8 | Yes | Season title. |
overview | UTF8 | Yes | Season overview/description. |
poster | UTF8 | Yes | Season poster image path. |
air_date | UTF8 | Yes | First air date of the season (ISO 8601). |
episodes.parquet
TV show episode metadata.
Sorted by:media_id, season_number, episode_number
| Column | Type | Null | Description |
|---|---|---|---|
media_id | UTF8 | Yes | Parent TV show media ID. |
season_number | UTF8 | Yes | Season number. |
episode_number | UTF8 | Yes | Episode number within the season. |
title | UTF8 | Yes | Episode title. |
overview | UTF8 | Yes | Episode overview/description. |
backdrop | UTF8 | Yes | Episode still/backdrop image path. |
air_date | UTF8 | Yes | Air date (ISO 8601). |
genres.parquet
Genre definitions.
Sorted by:id
| Column | Type | Null | Description |
|---|---|---|---|
id | UTF8 | Yes | Genre ID. |
slug | UTF8 | Yes | URL-safe slug (e.g. "action", "sci-fi"). |
name | UTF8 | Yes | Display name (e.g. "Action", "Science Fiction"). |
media_genres.parquet
Media-to-genre associations.
| Column | Type | Null | Description |
|---|---|---|---|
media_id | UTF8 | Yes | Internal media ID. |
genre_id | UTF8 | Yes | Genre ID (join with genres.parquet). |
media_ids.parquet
External ID mappings (IMDB, TMDB, TVDB).
| Column | Type | Null | Description |
|---|---|---|---|
media_id | UTF8 | Yes | Internal media ID. |
source | UTF8 | Yes | Source: "imdb", "tmdb", or "tvdb". |
external_id | UTF8 | Yes | External ID value (e.g. "tt0903747" for IMDB). |
media_ratings.parquet
Ratings from external sources.
| Column | Type | Null | Description |
|---|---|---|---|
media_id | UTF8 | Yes | Internal media ID. |
source | UTF8 | Yes | Rating source (e.g. "imdb", "tmdb"). |
external_rating | UTF8 | Yes | Rating value as string (e.g. "8.5"). |
media_rankings.parquet
Per-country ranking scores (aggregated).
| Column | Type | Null | Description |
|---|---|---|---|
media_id | UTF8 | Yes | Internal media ID. |
country | UTF8 | Yes | ISO 3166-1 alpha-2 country code. |
score | UTF8 | Yes | Composite quality score. |
position | UTF8 | Yes | Rank position (1 = most popular). |
points | UTF8 | Yes | Raw points. |
media_content_ratings.parquet
Per-country content ratings (e.g. "TV-MA", "PG-13").
| Column | Type | Null | Description |
|---|---|---|---|
media_id | UTF8 | Yes | Internal media ID. |
country | UTF8 | Yes | ISO 3166-1 alpha-2 country code. |
rating | UTF8 | Yes | Content rating string (e.g. "TV-MA", "PG-13", "15"). |
media_release_dates.parquet
Per-country release dates.
| Column | Type | Null | Description |
|---|---|---|---|
media_id | UTF8 | Yes | Internal media ID. |
country | UTF8 | Yes | ISO 3166-1 alpha-2 country code. |
release_date | UTF8 | Yes | Release date (ISO 8601). |
media_talents.parquet
Cast and crew associations.
| Column | Type | Null | Description |
|---|---|---|---|
media_id | UTF8 | Yes | Internal media ID. |
people_id | UTF8 | Yes | Person ID (join with peoples.parquet). |
role | UTF8 | Yes | Character name (cast) or job title (crew). |
role_type | UTF8 | Yes | "cast" or "crew". |
rank | UTF8 | Yes | Billing order (lower = higher billing). |
peoples.parquet
Person records (actors, directors, writers, etc.).
| Column | Type | Null | Description |
|---|---|---|---|
id | UTF8 | Yes | Person ID. |
name | UTF8 | Yes | Full name. |
media_videos.parquet
Trailers and video clips.
| Column | Type | Null | Description |
|---|---|---|---|
media_id | UTF8 | Yes | Internal media ID. |
source | UTF8 | Yes | Video source ("youtube" or "rumble"). |
video_id | UTF8 | Yes | Video ID on the source platform. |
providers.parquet
Provider registry with per-country weights, denormalized. One row per (provider, country) combination. Exported from the local metadata staging database.
Sorted by:country, weight
| Column | Type | Null | Description |
|---|---|---|---|
id | UTF8 | Yes | Provider identifier, e.g. "netflix", "disney_plus". |
parent_id | UTF8 | Yes | Parent provider ID for add-on channels (e.g. Prime Video for Showtime). NULL for direct providers. |
name | UTF8 | Yes | Display name, e.g. "Netflix", "Disney+". |
logo | UTF8 | Yes | Logo image URL. |
short_id | UTF8 | Yes | Short identifier used in display contexts. |
country | UTF8 | Yes | Country code (ISO 3166-1 alpha-2) for this weight entry. |
weight | UTF8 | Yes | Display order weight within the country (higher = shown first). |
provider_weights.parquet
Per-country provider popularity weights.
| Column | Type | Null | Description |
|---|---|---|---|
provider_id | UTF8 | Yes | Provider ID (join with providers.parquet). |
country | UTF8 | Yes | ISO 3166-1 alpha-2 country code. |
weight | UTF8 | Yes | Popularity weight (higher = more popular in this country). |
collections.parquet
Curated collections metadata. Collections can be editorial (hand-picked), dynamic (rule-based, evaluated at snapshot time), or thematic (curated around a theme). Only public collections are exported.
Sorted by:position, id
| Column | Type | Null | Description |
|---|---|---|---|
id | UTF8 | Yes | Collection identifier. |
name | UTF8 | Yes | Display name. |
slug | UTF8 | Yes | URL-safe slug. |
description | UTF8 | Yes | Description text. |
country | UTF8 | Yes | Country code this collection is curated for (ISO 3166-1 alpha-2). NULL = global. |
language | UTF8 | Yes | Primary language code for the collection (ISO 639-1). |
category | UTF8 | Yes | Collection category: "editorial", "dynamic", or "thematic". |
position | UTF8 | Yes | Display order (lower = shown first). |
cover_media_id | UTF8 | Yes | Internal media ID used as the collection cover/hero image. NULL if none. |
countries | UTF8 | Yes | JSON array of ISO 3166-1 alpha-2 country codes this collection targets. NULL = all countries. |
media_collections.parquet
Media collection memberships, denormalized with media fields.
Sorted by:collection_id, position
| Column | Type | Null | Description |
|---|---|---|---|
media_id | UTF8 | Yes | Internal Popcorn Time media identifier. |
collection_id | UTF8 | Yes | Collection identifier. |
position | UTF8 | Yes | Display position within the collection (lower = shown first). |
slug | UTF8 | Yes | Media URL-safe slug. |
title | UTF8 | Yes | Media primary display title. |
original_title | UTF8 | Yes | Media original-language title. |
year | UTF8 | Yes | Media release year as a string. |
content_type | UTF8 | Yes | Media content type: "movie" or "tv_show". |
poster | UTF8 | Yes | Media poster image path (TMDB CDN relative path). |
backdrop | UTF8 | Yes | Media backdrop image path (TMDB CDN relative path). |
featured.parquet
Featured homepage content, denormalized with media or collection fields. Editorially curated hero carousel items. Supports both individual media and collections. Managed via the team portal, synced through the pipeline.
Sorted by:country, feature_kind, rank
| Column | Type | Null | Description |
|---|---|---|---|
country | UTF8 | Yes | ISO 3166-1 alpha-2 country code. |
r#type | UTF8 | Yes | Featured item type: "media" or "collection". |
media_id | UTF8 | Yes | Internal media identifier. Set when type = "media". |
collection_id | UTF8 | Yes | Collection identifier. Set when type = "collection". |
rank | UTF8 | Yes | Display rank within the feature kind (lower = shown first). |
feature_kind | UTF8 | Yes | Feature period: "day" (refreshed daily) or "week" (refreshed weekly). |
featured_from | UTF8 | Yes | Date from which this featured entry is active (YYYY-MM-DD). |
slug | UTF8 | Yes | Media or collection URL-safe slug. |
title | UTF8 | Yes | Media title or collection name. |
original_title | UTF8 | Yes | Media original-language title. NULL for collections. |
year | UTF8 | Yes | Media release year. NULL for collections. |
content_type | UTF8 | Yes | Content type: "movie", "tv_show", or "collection". |
tagline | UTF8 | Yes | Editorial tagline override (e.g. "Just landed on Netflix") or media's default tagline. |
overview | UTF8 | Yes | Collection description or media plot overview. |
poster | UTF8 | Yes | Poster image path (TMDB CDN relative path). NULL for collections. |
backdrop | UTF8 | Yes | Backdrop image path (TMDB CDN relative path). NULL for collections. |
popularity | UTF8 | Yes | TMDB popularity score as a string. |
tmdb_rating | UTF8 | Yes | TMDB average rating as a string (0–10). NULL for collections. |
lang_{language}.parquet
Per-language media translations. One file per language, e.g. `translations/lang_en.parquet`. Contains localized title, poster, backdrop, tagline, and overview.
Sorted by:media_id
| Column | Type | Null | Description |
|---|---|---|---|
media_id | UTF8 | Yes | Internal Popcorn Time media identifier. |
title | UTF8 | Yes | Localized title. |
poster | UTF8 | Yes | Localized poster image path (TMDB CDN relative path). |
backdrop | UTF8 | Yes | Localized backdrop image path (TMDB CDN relative path). |
tagline | UTF8 | Yes | Localized tagline. |
overview | UTF8 | Yes | Localized plot overview. |