52Shuffled — Scientific Research Protocol

Large-Scale Empirical Study of Human Card Shuffle Randomness

Document Version: 1.0

Date: 13 February 2026

Status: Active

Principal Investigator: S Singh

Institution: Independent Research / 52Shuffled Project

Legal & Research Documents

Abstract

This document defines the research protocol for 52Shuffled, a large-scale crowdsourced experiment collecting verified permutations of time-constrained human card shuffles. The study aims to (a) empirically characterize the randomness quality of brief (4–8 second) card shuffles across a diverse population, (b) measure the distribution of pairwise similarity between independently shuffled decks, and (c) build the first publicly available, video-verified dataset of realistic human shuffle permutations for use in combinatorics, probability theory, cognitive science, and AI research.

Each submission follows a structured three-phase video protocol (fan, shuffle, reveal), undergoes YOLOv8 computer vision detection with mandatory participant confirmation, and is compared against all prior submissions using multiple distance metrics. All raw data, detection outputs, confirmation timestamps, and metadata are preserved to support full reproducibility and future reanalysis.

2. Research Objectives

2.1 Primary Objectives

RO-1: Construct the world's first large-scale, video-verified dataset of human card shuffle permutations, with each permutation linked to its source video evidence.

RO-2: Measure the empirical distribution of pairwise exact-position matches between independently shuffled decks and compare against the theoretical distribution under the assumption of uniform random permutations.

RO-3: Quantify the randomness quality of real human shuffles using established permutation statistics (displacement, fixed points, inversion count, longest ascending subsequence, cycle structure) and characterize the gap between human shuffles and uniformly random permutations.

2.2 Secondary Objectives

RO-4: Investigate whether shuffle randomness varies systematically with observable factors including shuffle technique (riffle, overhand, wash, hybrid), shuffle duration, and repeated participation.

RO-5: Evaluate the accuracy and reliability of YOLOv8 object detection for card recognition from uncontrolled video, using post-submission validation as accuracy measurement.

RO-6: Produce an open, well-documented dataset suitable for secondary research in combinatorics, human randomness behaviour, computer vision, and probability education.

↑ Back to top

3. Background & Literature

3.1 The Scale of 52!

The number of distinct permutations of a standard 52-card deck is:

52! = 80,658,175,170,943,878,571,660,636,856,403,766,975,289,505,440,883,277,824,000,000,000,000

This is approximately 8.07 × 10⁶⁷. For context, the estimated number of atoms in the observable universe is approximately 10⁸⁰, and the number of seconds since the Big Bang is approximately 4.35 × 10¹⁷. The space of possible deck orderings is, for all practical purposes, inexhaustible.

3.2 The Diaconis–Bayer Result

The foundational work on card shuffle randomness is Bayer & Diaconis (1992), "Trailing the Dovetail Shuffle to its Lair," which proved that 7 riffle shuffles are necessary and sufficient to bring a 52-card deck to a state statistically indistinguishable from uniform random. Fewer than 7 shuffles leave detectable structure; beyond 7, additional shuffles provide diminishing returns. This result applies specifically to the Gilbert–Shannon–Reeds model of riffle shuffling.

Subsequent work by Diaconis (2003) extended analysis to other shuffle types:

Overhand shuffle: approximately 2,500 shuffles needed for randomness
Wash/corgi shuffle: approximately 30 seconds of washing on a table
Smooshing: effective but duration-dependent

These results predict that most casual human shufflers, who typically perform 2–4 shuffles, produce permutations with significant residual structure. This study aims to empirically validate these predictions at scale.

3.3 Pairwise Match Probabilities

For two independently and uniformly randomly chosen permutations of n elements, the number of fixed points in their relative permutation (i.e., positions where the same card appears) follows a distribution approaching Poisson(1) as n → ∞. For n = 52:

Matches (k)	P(exactly k matches)	P(≥ k matches)
0	0.3679	1.0000
1	0.3679	0.6321
2	0.1839	0.2642
3	0.0613	0.0803
4	0.0153	0.0190
5	0.0031	0.0037
6	0.0005	0.0006
7+	< 0.0001	0.0001

Expected matches between two random permutations = 1.0 (exactly). These theoretical baselines serve as the null hypothesis against which we compare our empirical observations.

3.4 Prior Work

No prior study has collected a large, verified dataset of real human shuffle permutations. Existing research relies on:

Mathematical modelling of specific shuffle types (Diaconis et al.)
Small-sample laboratory experiments (typically n < 100)
Computer simulation

This study addresses the gap by collecting empirical data at scale with video verification.

↑ Back to top

4. Hypotheses

H1: Human shuffles are significantly less random than uniform random permutations

Null: The distribution of permutation statistics (displacement, inversions, fixed points, LAS) in the collected shuffles matches that of uniformly random permutations.

Alternative: Human shuffles show statistically significant deviation from uniform random, with lower displacement, fewer inversions, and longer ascending subsequences.

Test: Kolmogorov-Smirnov test comparing empirical distributions against simulated uniform random distributions. Significance level α = 0.05.

H2: Pairwise matches exceed theoretical expectations

Null: The distribution of exact position matches between all pairs of submissions follows Poisson(1).

Alternative: Due to non-random shuffling, the observed distribution shows heavier tails (more high-match pairs) than Poisson(1) predicts.

Test: Chi-squared goodness-of-fit against Poisson(1) distribution.

H3: Shuffle duration correlates with randomness quality

Null: No relationship between time spent shuffling and permutation randomness metrics.

Alternative: Longer shuffle durations produce permutations closer to uniform random.

Test: Pearson/Spearman correlation between shuffle-phase duration and composite randomness score.

H4: Repeated participation improves shuffle randomness

Null: Randomness metrics do not change across sequential submissions from the same participant.

Alternative: Participants who submit multiple shuffles show improvement in randomness scores over time.

Test: Paired t-test / linear mixed-effects model on sequential submissions per user.

↑ Back to top

5. Methodology

5.1 Study Design

This is an observational, crowdsourced, open-enrollment study. Participation is voluntary and open to anyone with:

A standard 52-card deck (any manufacturer, standard suits and ranks)
A device with a camera capable of video recording
An internet connection

There is no control group in the traditional sense. Instead, we use simulated uniform random permutations as the reference distribution for comparison.

5.2 Sample Size Considerations

Minimum viable dataset: 500 verified submissions (sufficient for basic distributional analysis and hypothesis tests H1–H3 at α = 0.05 with power > 0.8).

Target dataset: 10,000+ verified submissions (enables subgroup analysis by shuffle technique, geographic region, and longitudinal analysis for repeat participants).

Pairwise comparisons scale: n submissions produce n(n-1)/2 pairwise comparisons. At 10,000 submissions, this yields ~50 million pairs — sufficient for precise estimation of the empirical match distribution.

5.3 Participant Recruitment

Participants are recruited through:

Direct web traffic to the 52Shuffled platform
Social media sharing (organic)
Word of mouth

No compensation is provided. Participation is motivated by:

Personal curiosity (seeing one's shuffle results)
Contributing to a scientific dataset
The challenge of the impossibility claim

5.4 Inclusion Criteria

Any standard 52-card deck (4 suits × 13 ranks, no jokers)
Video must capture all three phases (fan, shuffle, reveal)
All 52 cards must be identifiable in the reveal phase
User must complete the verification step

5.5 Exclusion Criteria

Incomplete submissions (fewer than 52 cards identified)
Failed video verification (phases not discernible)
Duplicate submissions from the same user with identical card orders
Submissions flagged by statistical anomaly detection (≥30 position matches with any existing submission)

↑ Back to top

6. Data Collection Protocol

6.1 Three-Phase Video Protocol

Each submission follows a mandatory three-phase video recording:

Phase 1: Fan Display (approximately 5 seconds)

Purpose: Prove the deck is complete and establish the starting order.

Instructions: The participant fans all 52 cards face-up toward the camera.

Captured data: Starting order of the deck (pre-shuffle permutation).

Integrity function: Verifies a complete 52-card deck is present before shuffling begins.

Phase 2: Shuffle (4–8 seconds, randomised per session)

Purpose: The participant shuffles the deck under time constraints insufficient for achieving full randomness.

Instructions: On-screen prompt instructs "Keep shuffling until you see STOP." The duration is randomised between 4 and 8 seconds per session.

Rationale for brief, randomised duration: (1) Time constraints create realistic, incomplete shuffling conditions representative of casual shuffling scenarios, (2) Randomised duration prevents rehearsal of fixed-length shuffle routines, (3) Varying duration reduces the ability to pre-arrange outcomes.

Expected outcome: The brief (4–8 second) duration is deliberately insufficient for achieving uniform randomness, creating an empirical study of human shuffle quality under realistic time constraints.

Captured data: Exact shuffle duration (randomized value) and shuffle technique (observable from video for post-hoc classification).

Phase 3: Card-by-Card Reveal (participant-paced)

Purpose: Record the final shuffled order.

Instructions: The participant holds the deck face-down and reveals each card one at a time by removing the top card, showing its face to the camera.

Captured data: The 52-card permutation in the order revealed.

6.2 Recording Requirements

Format: Video (WebM or MP4), captured via the browser MediaRecorder API
Minimum resolution: 720p recommended; no hard minimum enforced
Camera: Device camera only (no file upload from gallery)
Audio: Not captured (privacy measure)
Maximum duration: 180 seconds
Lighting: Ambient; no specific requirements (AI must handle varying conditions)

6.3 Data Capture Points

For each submission, the following is captured and stored:

Field	Type	Source	Purpose
Submission ID	UUID	System-generated	Unique identifier
Sequential number	Integer	Auto-increment	Human-readable submission number
User ID	UUID	Auth system	Links to participant (pseudonymised)
Video file	Binary (WebM/MP4)	Client camera	Primary evidence, audit trail
Snapshot frame	Binary (JPEG)	Last video frame	Backup / quick reference
AI-extracted card order	Integer[52]	AI pipeline	Raw machine reading
AI confidence score	Float (0–1)	AI pipeline	Per-submission confidence
User-verified card order	Integer[52]	Human verification	Ground truth permutation
Cards corrected by user	Integer[52] boolean map	Diff of AI vs user	AI accuracy measurement
Number of corrections	Integer	Computed	Summary metric
Shuffle phase duration	Float (seconds)	Video timestamps	Shuffle duration variable
Submission timestamp	ISO 8601	System clock	Temporal data
Device fingerprint	Hash	Client-side	Anti-fraud
IP address (hashed)	Hash	Server-side	Anti-fraud, geo approximation
User agent	String	HTTP header	Device/browser metadata

↑ Back to top

7. Computer Vision Card Detection Pipeline

7.1 Current Implementation

Model: YOLOv8s (You Only Look Once v8 Small) — playing cards fine-tuned

Architecture: Dual-stage detection system

Stage 1 (Client-side): ONNX Runtime Web running yolov8s_playing_cards_320.onnx in browser Web Worker for real-time feedback
Stage 2 (Server-side): Python YOLOv8 server (/card-detection/server.py) for post-processing verification

Input: Video file + extracted frames from Phase 3 (reveal)

Output: JSON array of 52 card codes in reveal order, plus per-card confidence scores

Cost: $0 (self-hosted open-source model)

The detection pipeline:

Extracts frames from the reveal phase of the video
Runs YOLOv8 object detection on each frame to identify card rank and suit
Aggregates detections across frames to construct the 52-card sequence
Returns confidence scores for each detected card

7.2 Detection Output Format

{
  "cards": ["KC", "7H", "AS", "2D", ...],  // 52 card codes
  "confidence": [0.95, 0.88, 0.92, ...],   // per-card confidence (0–1)
  "overall_confidence": 0.91                // average confidence
}

Card code format: {Rank}{Suit} where Rank ∈ {A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K} and Suit ∈ {S, H, D, C}.

7.3 Detection Accuracy Tracking

Detection accuracy is measured through post-submission validation (duplicate/missing card detection). We track:

Per-submission accuracy: Percentage of submissions passing validation on first attempt
Rejection reasons: Statistics on why detections fail (duplicate cards, missing cards, low confidence)
Confidence distribution: Relationship between confidence scores and validation success rate
Accuracy by condition: Correlation with video quality, lighting, card brand, etc.

This data is preserved as a secondary dataset for computer vision research.

7.4 Pipeline Versioning

All detection outputs are stored with:

Model identifier and version (YOLOv8s + weights version)
Raw detection output (full, unmodified JSONB)
Timestamp of detection
Detection stage (client vs server)

If the detection pipeline is updated (model retrained, architecture change), all submissions retain their original detection data. Re-detection with improved models is logged as a separate record, not an overwrite.

↑ Back to top

8. Human Confirmation Protocol

8.1 Confirmation Flow

After computer vision detection, the participant is presented with a 4×13 visual grid displaying the detected card order. The participant:

Reviews the complete 52-card layout
Confirms the detection is correct OR Rejects and re-submits

Critical anti-fraud design: Users cannot edit individual cards. This prevents manipulation of the final order to achieve desired match scores. The detection is either accepted as-is or the entire submission is rejected.

The confirmed detection becomes the authoritative permutation stored as ground truth.

8.2 Post-Verification Validation

Even after participant confirmation, the system performs automated validation to catch detection errors:

8.2.1 Duplicate Card Detection

If the confirmed order contains the same card multiple times (e.g., two King of Spades):

The submission is automatically rejected
The participant sees an error: "Invalid deck detected (duplicate cards). Please re-submit."
The rejected submission is logged with reason: "duplicate_cards_detected"

8.2.2 Missing Card Detection

If fewer than 52 unique cards are detected or any standard card is missing:

The submission is automatically rejected
The participant sees: "Incomplete deck detected (missing cards). Please re-submit."
Logged as: "missing_cards_detected"

8.2.3 Acceptance

Only submissions with exactly 52 unique standard cards proceed to the database. Rejected submissions are logged separately with full data (video, detection output, rejection reason).

Rationale: This post-submission validation compensates for the inability to verify deck completeness in Phase 1. Combined with no-edit confirmation, it ensures only valid 52-card permutations enter the research dataset while preventing user manipulation.

8.3 Confirmation Integrity

Concern: Could detection errors compromise data quality?

Mitigations:

The video is stored independently and can be re-analyzed
Dual-stage detection (client + server) provides cross-validation
Post-submission validation catches impossible decks (duplicates/missing cards)
No user editing prevents deliberate manipulation of results
Statistical analysis flags submissions with unusual patterns
Bulk re-analysis capability: if detection accuracy improves, all submissions can be re-detected from source video

Design philosophy: Prioritize fraud prevention over detection accuracy. A few misdetected cards that users must reject and re-submit is preferable to allowing users to edit cards and manipulate match scores.

8.4 Handling Unreadable Cards

If detection fails to identify cards with sufficient confidence:

The submission is marked as incomplete
It is rejected (user sees "Detection failed" message)
Post-submission validation catches any remaining errors (duplicates/missing cards)
The participant may re-submit

Note: Users cannot edit individual cards, only accept or reject the entire detection.

↑ Back to top

9. Statistical Analysis Framework

9.1 Per-Submission Metrics

Each verified permutation σ (where σ(i) = card at position i, with i = 0..51 representing the sorted deck order) is analysed for:

9.1.1 Displacement Score

Mean absolute displacement from the identity permutation:

D(σ) = (1/52) × Σ_i |σ(i) - i|

Expected value under uniform random: E[D] ≈ 17.5

Interpretation: Higher = more displaced from original order = better shuffle

9.1.2 Fixed Points

Count of positions where σ(i) = i (card remains in original position):

F(σ) = |{i : σ(i) = i}|

Expected value under uniform random: E[F] = 1.0

Distribution: Approximately Poisson(1) for large n

Interpretation: More fixed points = worse shuffle

9.1.3 Kendall Tau Distance (Inversion Count)

Number of pairwise inversions from the identity:

K(σ) = |{(i,j) : i < j and σ(i) > σ(j)}|

Expected value under uniform random: E[K] = n(n-1)/4 = 663

Range: 0 (identity) to 1,326 (reverse)

Interpretation: Closer to 663 = more random

9.1.4 Number of Ascending Runs

Count of maximal ascending consecutive subsequences:

R(σ) = 1 + |{i : σ(i) > σ(i+1)}|

Expected value under uniform random: E[R] = (n+1)/2 = 26.5

Interpretation: Closer to 26.5 = more random; near 1 = nearly sorted

9.1.5 Longest Ascending Subsequence (LAS)

Length of the longest strictly increasing subsequence (not necessarily contiguous).

Expected value under uniform random: E[LAS] ≈ 2√n ≈ 14.4

Interpretation: Shorter = more random; near 52 = nearly sorted

9.1.6 Cycle Structure

Decomposition of σ into disjoint cycles. We record:

Number of cycles
Length of each cycle
Length of longest cycle

Expected number of cycles under uniform random: H₅₂ ≈ 4.56 (52nd harmonic number)

9.1.7 Suit Clustering Index

Mean distance between consecutive appearances of cards from the same suit.

Expected value under uniform random: E[SC] = 52/13 = 4.0

Interpretation: Near 4.0 = well-distributed suits; near 1.0 = highly clustered

9.2 Composite Randomness Score

A normalised composite score combining all metrics, scaled so that:

0 = identity permutation (no shuffle at all)
100 = indistinguishable from uniform random

Computed by converting each metric to a percentile rank against 100,000 simulated uniform random permutations, then averaging the percentiles.

9.3 Pairwise Comparison Metrics

For every pair of submissions (σ₁, σ₂):

9.3.1 Exact Position Matches

M(σ₁, σ₂) = |{i : σ₁(i) = σ₂(i)}|

This is the primary comparison metric and the basis for the "closest pair" record.

9.3.2 Longest Common Run

Length of the longest contiguous subsequence where σ₁ and σ₂ agree.

9.3.3 Kendall Tau Distance Between Pair

Number of pairwise disagreements between σ₁ and σ₂ (not relative to identity, but to each other).

9.4 Aggregate Analyses

Distribution of all per-submission metrics vs simulated uniform random (Q-Q plots, K-S tests)
Distribution of pairwise matches vs Poisson(1) (chi-squared goodness-of-fit)
Correlation matrix between metrics
Subgroup analysis: by shuffle duration, by repeat-submitter sequence number
Time-series analysis: does the population's average randomness change over time?

↑ Back to top

10. Data Integrity & Chain of Custody

10.1 Immutability Principles

Video files are immutable: Once uploaded, video files are never modified or deleted (except under GDPR/privacy deletion requests, which are logged)
AI extraction is append-only: Original AI output is never overwritten. Re-extractions create new records
User corrections are logged as diffs: The verified permutation is stored alongside the original AI read, preserving both
Metadata is timestamped: All records include creation timestamps

10.2 Storage Architecture

Data	Storage	Retention	Access
Video files	DigitalOcean Spaces (S3-compatible)	Indefinite	Private; accessible only via presigned URLs
Snapshot images	DigitalOcean Spaces	Indefinite	Private
Permutation data	PostgreSQL	Indefinite	Application + research export
AI raw responses	PostgreSQL (JSONB)	Indefinite	Application
User corrections	PostgreSQL	Indefinite	Application + research export
Metadata	PostgreSQL	Indefinite	Application + research export

10.3 Audit Trail

Every state change on a submission is logged:

Created (upload complete)
AI extraction complete (with model version, raw response)
User verification started
User corrections applied (with diff)
User verification confirmed
Comparison complete (matches computed)
Flagged (if triggered by anti-fraud)

10.4 Data Checksums

Video files: SHA-256 hash computed at upload, stored in submission record
Permutation data: hash of the 52-integer array, stored for duplicate detection

↑ Back to top

11. Anti-Fraud & Quality Assurance

11.1 Threat Model

Threat	Risk	Mitigation
Pre-arranged deck (not actually shuffled)	High	Randomised shuffle duration, video evidence, statistical detection
Gallery upload (pre-recorded/fabricated video)	Medium	Camera-only capture via MediaRecorder API, no file input
Editing the verified order to fake a match	Medium	Video retained for re-verification, corrections logged as diff
Multiple accounts submitting same arrangement	Low	Card order hash deduplication, device fingerprinting
Automated/bot submissions	Low	Rate limiting, device fingerprinting, CAPTCHA (if needed)

11.2 Detection Mechanisms

Statistical Anomaly Detection

Any submission with ≥30 exact position matches against any existing submission is automatically flagged
Submissions with extremely low randomness scores (near-identity permutations) are flagged
Sudden clusters of high-match submissions from related accounts/IPs are flagged

Rate Limiting

Maximum 10 submissions per user per day
Maximum 20 submissions per IP address per day

Device Fingerprinting

Client-side fingerprint hash collected (no PII stored)
Cross-referenced to detect multi-accounting

11.3 Flagged Submission Handling

Flagged submissions are:

Excluded from the primary "clean" dataset
Retained in a separate "flagged" dataset for analysis
Video available for manual review
Not counted toward records or leaderboards

11.4 Quality Tiers

Tier	Criteria	Use
Gold	Detection confidence ≥ 0.9, randomness score ≥ 50	Primary dataset, all analyses
Silver	Detection confidence ≥ 0.7	Primary dataset with annotation
Bronze	Detection confidence < 0.7	Secondary dataset, detection accuracy analysis only
Flagged	Triggered anomaly detection	Excluded from primary dataset

↑ Back to top

12. Ethical Considerations

12.1 Informed Consent

Before first submission, participants are presented with a consent screen explaining:

What data is collected (video, card order, device metadata)
How data will be used (research, comparison, dataset publication)
That video may be retained indefinitely for verification
That anonymised permutation data may be published as a research dataset
Their right to withdraw (delete their data)

Consent is recorded with a timestamp. Participation cannot proceed without consent.

12.2 Privacy

No audio is captured in videos (microphone not activated)
Video content requirements: Participants are instructed that videos must show only their hands and the playing cards. No other persons, faces, identifiable features, or personal items should be visible. This is enforced through clear on-screen guidance before recording begins.
Videos are stored privately and not publicly accessible; however, videos may be anonymised (faces and identifiable features removed or obscured, if any) and supplied to third-party researchers for additional research purposes (e.g., computer vision, card recognition AI training)
Permutation data (52 integers) contains no personally identifiable information
Published datasets will use pseudonymised participant IDs (UUID), not email/name
IP addresses are stored as hashes, not raw
Device fingerprints are stored as hashes, not raw

12.3 Data Subject Rights

Participants may at any time:

View all their submissions and associated data
Request deletion of any or all submissions (video + database records)
Export their personal data in machine-readable format

Deletion requests are honoured within 30 days.

12.4 Minors

The platform does not knowingly collect data from users under 13. Age verification is not actively enforced beyond the terms of service.

12.5 IRB / Ethics Review

This protocol has been designed to support submission to an Institutional Review Board or ethics committee if required for academic collaboration. Key considerations:

Minimal risk to participants (shuffling cards)
No deception
Informed consent obtained
Data minimisation (only what's needed)
Pseudonymised identifiers

12.6 Funding & Independence

Funding Model: This research is independently funded through a hybrid sustainability model:

Advertising Revenue (primary current funding)
- Non-intrusive display advertisements during processing/waiting screens only
- Never displayed during active experiment phases (fan, shuffle, reveal)
- Used exclusively to support research infrastructure (server costs, storage, development)
- Participants may opt out by upgrading to Premium membership
Premium Membership (optional, ad-free experience)
- Voluntary subscription providing enhanced features and ad-free participation
- Does not affect research participation or data quality
- Revenue supports ongoing platform maintenance and development
Data Licensing (future revenue stream)
- Commercial entities may license dataset access for product development or AI training
- Academic and non-profit research remains free or DUA-controlled
- Licensing revenue reinvested in research infrastructure

Independence Statement:

Advertising partners and premium members have no role in study design, data collection, analysis, or publication decisions
All research methodology decisions are made independently by the research team
No advertiser or sponsor has access to non-public participant data
No editorial control or approval rights granted to any commercial partner
No financial incentive exists to manipulate results or favor particular outcomes

Conflicts of Interest: None declared. The research team has no financial relationships with card manufacturers, casino operators, or gaming companies that could bias study design or results.

Transparency: All funding sources and revenue models are disclosed in participant consent, privacy policy, and published academic papers.

↑ Back to top

13. Data Management Plan

13.1 Data Lifecycle

Collection → Extraction → Verification → Storage → Analysis → Publication
    ↓            ↓             ↓            ↓          ↓           ↓
  Video      AI output    User review   PostgreSQL  Internal   Anonymised
  upload     + confidence + corrections + Spaces    analytics  dataset

13.2 Storage & Backup

Primary storage: DigitalOcean Managed PostgreSQL (daily automated backups, 7-day retention)
Video/image storage: DigitalOcean Spaces (S3-compatible, 99.999999999% durability)
Backup: Weekly full database export to separate storage
Disaster recovery: Database can be restored from any daily backup; videos are stored on redundant object storage

13.3 Dataset Publication

When the dataset reaches sufficient size (target: 1,000+ Gold-tier submissions), we will publish:

Primary Dataset (open access):

Anonymised permutation data (52-integer arrays)
Per-submission metrics (displacement, fixed points, inversions, etc.)
Submission timestamp (date only, not time, for privacy)
Quality tier
Shuffle duration (actual randomized time for each submission)
Detection confidence scores

Secondary Dataset (restricted access, for detection accuracy research):

Detection raw output
Per-card detection confidence
Model version used
Detection accuracy metrics
Geographic metadata (country/region from IP or optional geolocation, never precise coordinates)
Device metadata (user agent, browser, OS)
Temporal metadata (submission time with timezone)

Failed Submissions Dataset (open access, for transparency):

Total number of failed submissions
Rejection reasons (counts by type: duplicate cards, missing cards, incomplete reveal, etc.)
Temporal distribution (to detect fraud patterns)
No permutation data or video (to protect participant privacy)

Anonymised Video Sharing:

Video recordings may be anonymised (faces and identifiable features removed or obscured, if any) and supplied to third-party researchers for additional research purposes, including computer vision research, card recognition AI training, and shuffle technique analysis
Anonymised videos are stripped of all metadata linking them to participant identity
Recipients are bound by data sharing agreements that prohibit re-identification

Not published:

Unanonymised video files (retained for verification only)
User identifiers (even pseudonymised UUIDs are replaced with sequential IDs in published data)
IP hashes, device fingerprints, user agents

13.4 Data Formats

Permutations: JSON arrays of 52 integers (0–51 encoding) or CSV
Metadata: JSON or CSV with documented schema
Dataset documentation: Accompanying data dictionary, methodology description, and this protocol document

13.5 Licensing & Access Control

The dataset uses a tiered licensing model to balance open science principles with responsible data stewardship:

Primary Dataset & Failed Submissions Dataset

License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

Permissions:

✅ Academic research and education
✅ Non-profit research organizations
✅ Reproduction, distribution, and derivative works
✅ Must provide attribution
✅ Derivative works must use same license (ShareAlike)
❌ Commercial use prohibited

Rationale: This license maximizes research accessibility while preventing commercial exploitation. The ShareAlike clause ensures derivative datasets remain openly available to the research community.

Secondary Dataset

Access Model: Controlled Access via Data Use Agreement (DUA)

Requirements:

Researchers must submit access request describing:
- Intended research purpose
- Institutional affiliation
- Data security measures
- Planned publications
Access requests are reviewed on a case-by-case basis
Approved applicants sign a Data Use Agreement stipulating:
- Non-commercial academic use only
- No attempt to re-identify participants
- No redistribution without permission
- Acknowledgment in publications
- Annual usage reports (for multi-year projects)

Access Decisions: The research team reserves the right to:

Approve or deny access requests based on research merit and data security
Revoke access if terms are violated
Require institutional review board (IRB) approval for certain use cases
Request co-authorship or collaboration for novel analyses

Rationale: The Secondary Dataset contains detailed technical metadata that, while anonymized, warrants controlled access to ensure responsible use and prevent potential misuse in adversarial machine learning contexts (e.g., training systems to defeat card detection).

Commercial Use Requests

Commercial entities seeking access (e.g., game developers, EdTech platforms, AI training) must:

Submit detailed use case and business purpose
Negotiate separate licensing terms
Pay licensing fees (to support ongoing research infrastructure)
Agree to ethical use provisions

Commercial licensing revenue, if any, will be reinvested in research infrastructure and dataset expansion.

Why Not Fully Open?

While open science is a core principle, responsible data stewardship requires balancing accessibility with:

Participant privacy: Even anonymized data can pose risks if misused
Research integrity: Preventing cherry-picking or misrepresentation of findings
Sustainability: Controlled access creates accountability and supports long-term maintenance
Academic priority: Ensuring academic researchers retain first opportunity for novel analyses

This tiered approach is standard practice in large-scale research datasets (e.g., UK Biobank, Human Connectome Project, ImageNet).

↑ Back to top

14. Limitations & Known Biases

14.1 Selection Bias

Participants self-select. The sample is not representative of the general population. It skews toward:

Technology-literate individuals (web app users)
People interested in mathematics or puzzles
English speakers (initial interface language)
Owners of standard playing cards

14.2 Observer Effect

Participants know their shuffle is being measured and compared. This may lead to:

More thorough shuffling than usual (Hawthorne effect)
Performance anxiety affecting shuffle technique
Deliberate attempts to shuffle "well"

This is acknowledged and discussed in any analysis. It does not invalidate the data — it characterises "shuffling when trying."

14.3 Starting Deck Order

The starting order of each participant's deck is unknown (unless identifiable from the fan phase). Most participants' decks will not start in sorted order — they are already in some shuffled state from prior use.

14.4 Deck Variation

Standard decks from different manufacturers may vary in card size, finish, face design, and wear, affecting shuffle mechanics and AI recognition.

14.5 Computer Vision Detection Accuracy

The YOLOv8 detection pipeline is not perfect. The confirmation-only verification system means:

Participants can only accept or reject the complete detection result
Invalid detections are rejected and participants must re-submit with better video quality
The "ground truth" is therefore detection-based with post-submission validation, not absolute truth
We prioritize fraud prevention over detection convenience

We report detection accuracy metrics alongside the dataset and maintain video files for potential re-detection with improved models.

14.6 Video Quality Variation

Video quality varies across devices and environments. Poor lighting, low resolution, or shaky video reduces detection accuracy and may require participants to re-submit.

↑ Back to top

15. Reproducibility

15.1 Protocol Registration

This protocol document is version-controlled and timestamped. Any amendments are recorded in the revision history.

15.2 Open Source

The analysis code, scoring algorithms, and comparison engine are open source and available in the project repository. Any researcher can:

Verify the scoring calculations
Run the same analysis on the published dataset
Propose improvements or corrections

15.3 Raw Data Availability

The published dataset includes sufficient raw data (52-integer permutations) for any researcher to independently compute all derived metrics.

15.4 Simulation Baselines

All comparisons against "uniform random" use reproducible simulations with documented seeds. The simulation code and parameters are published alongside the dataset.

↑ Back to top

16. Dataset Schema & Data Dictionary

16.1 Published Dataset Fields

Field	Type	Description
submission_id	Integer	Sequential submission number (1, 2, 3, ...)
card_order	Integer[52]	The verified permutation. Each integer 0–51 encodes a card.
quality_tier	Enum	gold, silver, bronze
detection_confidence	Float	Detection model confidence score (0–1)
shuffle_duration_seconds	Float	Duration of shuffle phase
randomness_score	Float	Composite randomness score (0–100)
displacement_mean	Float	Mean absolute displacement from reference order
fixed_points	Integer	Cards in original position
inversion_count	Integer	Kendall tau distance from identity
ascending_runs	Integer	Number of ascending runs
longest_ascending_subseq	Integer	Length of longest ascending subsequence
cycle_count	Integer	Number of cycles in permutation
longest_cycle	Integer	Length of longest cycle
suit_clustering_index	Float	Mean inter-suit-card distance
submission_date	Date	Date of submission (no time)
closest_match_count	Integer	Exact position matches with nearest other submission
closest_match_id	Integer	Submission ID of nearest match

16.2 Card Encoding

ID Range	Suit	Cards
0–12	Spades (S)	A♠, 2♠, 3♠, ..., K♠
13–25	Hearts (H)	A♥, 2♥, 3♥, ..., K♥
26–38	Diamonds (D)	A♦, 2♦, 3♦, ..., K♦
39–51	Clubs (C)	A♣, 2♣, 3♣, ..., K♣

Within each suit: 0=Ace, 1=Two, 2=Three, ..., 9=Ten, 10=Jack, 11=Queen, 12=King.

↑ Back to top

17. Glossary

Term	Definition
Permutation	An ordered arrangement of all 52 cards. Represented as σ where σ(i) = card in position i.
Identity permutation	The sorted deck order (A–K of Spades, A–K of Hearts, A–K of Diamonds, A–K of Clubs).
Reference order/position	The baseline against which displacement and fixed points are measured. If the starting deck order is identifiable from the fan phase, the reference is that starting order. Otherwise, the identity permutation (sorted order) serves as a consistent reference frame. See Section 14.3.
Fixed point	A position i where σ(i) = i; the card did not move from its reference position.
Displacement	The absolute distance a card moved from its reference position: \|σ(i) - i\|.
Inversion	A pair (i, j) where i < j but σ(i) > σ(j). Measures disorder.
Kendall tau distance	Total number of inversions in a permutation.
Ascending run	A maximal contiguous increasing subsequence within the permutation.
Cycle	In permutation theory, a subset of elements that map to each other in a closed loop under the permutation.
Derangement	A permutation with zero fixed points.
Poisson(1)	A probability distribution with mean 1, modelling the expected number of fixed points between two random permutations.

↑ Back to top

18. References

Bayer, D., & Diaconis, P. (1992). Trailing the Dovetail Shuffle to its Lair. The Annals of Applied Probability, 2(2), 294–313.
Diaconis, P. (2003). Mathematical Developments from the Analysis of Riffle Shuffling. In Groups, Combinatorics & Geometry (pp. 73–97). World Scientific.
Aldous, D., & Diaconis, P. (1986). Shuffling Cards and Stopping Times. The American Mathematical Monthly, 93(5), 333–348.
Mann, B. (1994). How Many Times Should You Shuffle a Deck of Cards? UMAP Journal, 15, 303–332.
Trefethen, L. N., & Trefethen, L. M. (2000). How Many Shuffles to Randomize a Deck of Cards? Proceedings of the Royal Society A, 456, 2561–2568.
Knuth, D. E. (1997). The Art of Computer Programming, Volume 2: Seminumerical Algorithms (3rd ed.). Addison-Wesley. [Section 3.3.4: Random Permutations]
Montmort, P. R. de (1708/1713). Essay d'Analyse sur les Jeux de Hazard. [Original work on derangements and fixed points]

↑ Back to top

19. Revision History

Version	Date	Changes
1.0	2026-02-13	Initial protocol document

↑ Back to top