This document defines the research protocol for 52Shuffled, a large-scale crowdsourced experiment collecting verified permutations of time-constrained human card shuffles. The study aims to (a) empirically characterize the randomness quality of brief (4–8 second) card shuffles across a diverse population, (b) measure the distribution of pairwise similarity between independently shuffled decks, and (c) build the first publicly available, video-verified dataset of realistic human shuffle permutations for use in combinatorics, probability theory, cognitive science, and AI research.
Each submission follows a structured three-phase video protocol (fan, shuffle, reveal), undergoes YOLOv8 computer vision detection with mandatory participant confirmation, and is compared against all prior submissions using multiple distance metrics. All raw data, detection outputs, confirmation timestamps, and metadata are preserved to support full reproducibility and future reanalysis.
RO-1: Construct the world's first large-scale, video-verified dataset of human card shuffle permutations, with each permutation linked to its source video evidence.
RO-2: Measure the empirical distribution of pairwise exact-position matches between independently shuffled decks and compare against the theoretical distribution under the assumption of uniform random permutations.
RO-3: Quantify the randomness quality of real human shuffles using established permutation statistics (displacement, fixed points, inversion count, longest ascending subsequence, cycle structure) and characterize the gap between human shuffles and uniformly random permutations.
RO-4: Investigate whether shuffle randomness varies systematically with observable factors including shuffle technique (riffle, overhand, wash, hybrid), shuffle duration, and repeated participation.
RO-5: Evaluate the accuracy and reliability of YOLOv8 object detection for card recognition from uncontrolled video, using post-submission validation as accuracy measurement.
RO-6: Produce an open, well-documented dataset suitable for secondary research in combinatorics, human randomness behaviour, computer vision, and probability education.
↑ Back to topThe number of distinct permutations of a standard 52-card deck is:
This is approximately 8.07 × 1067. For context, the estimated number of atoms in the observable universe is approximately 1080, and the number of seconds since the Big Bang is approximately 4.35 × 1017. The space of possible deck orderings is, for all practical purposes, inexhaustible.
The foundational work on card shuffle randomness is Bayer & Diaconis (1992), "Trailing the Dovetail Shuffle to its Lair," which proved that 7 riffle shuffles are necessary and sufficient to bring a 52-card deck to a state statistically indistinguishable from uniform random. Fewer than 7 shuffles leave detectable structure; beyond 7, additional shuffles provide diminishing returns. This result applies specifically to the Gilbert–Shannon–Reeds model of riffle shuffling.
Subsequent work by Diaconis (2003) extended analysis to other shuffle types:
These results predict that most casual human shufflers, who typically perform 2–4 shuffles, produce permutations with significant residual structure. This study aims to empirically validate these predictions at scale.
For two independently and uniformly randomly chosen permutations of n elements, the number of fixed points in their relative permutation (i.e., positions where the same card appears) follows a distribution approaching Poisson(1) as n → ∞. For n = 52:
| Matches (k) | P(exactly k matches) | P(≥ k matches) |
|---|---|---|
| 0 | 0.3679 | 1.0000 |
| 1 | 0.3679 | 0.6321 |
| 2 | 0.1839 | 0.2642 |
| 3 | 0.0613 | 0.0803 |
| 4 | 0.0153 | 0.0190 |
| 5 | 0.0031 | 0.0037 |
| 6 | 0.0005 | 0.0006 |
| 7+ | < 0.0001 | 0.0001 |
Expected matches between two random permutations = 1.0 (exactly). These theoretical baselines serve as the null hypothesis against which we compare our empirical observations.
No prior study has collected a large, verified dataset of real human shuffle permutations. Existing research relies on:
This study addresses the gap by collecting empirical data at scale with video verification.
↑ Back to topNull: The distribution of permutation statistics (displacement, inversions, fixed points, LAS) in the collected shuffles matches that of uniformly random permutations.
Alternative: Human shuffles show statistically significant deviation from uniform random, with lower displacement, fewer inversions, and longer ascending subsequences.
Test: Kolmogorov-Smirnov test comparing empirical distributions against simulated uniform random distributions. Significance level α = 0.05.
Null: The distribution of exact position matches between all pairs of submissions follows Poisson(1).
Alternative: Due to non-random shuffling, the observed distribution shows heavier tails (more high-match pairs) than Poisson(1) predicts.
Test: Chi-squared goodness-of-fit against Poisson(1) distribution.
Null: No relationship between time spent shuffling and permutation randomness metrics.
Alternative: Longer shuffle durations produce permutations closer to uniform random.
Test: Pearson/Spearman correlation between shuffle-phase duration and composite randomness score.
Null: Randomness metrics do not change across sequential submissions from the same participant.
Alternative: Participants who submit multiple shuffles show improvement in randomness scores over time.
Test: Paired t-test / linear mixed-effects model on sequential submissions per user.
This is an observational, crowdsourced, open-enrollment study. Participation is voluntary and open to anyone with:
There is no control group in the traditional sense. Instead, we use simulated uniform random permutations as the reference distribution for comparison.
Minimum viable dataset: 500 verified submissions (sufficient for basic distributional analysis and hypothesis tests H1–H3 at α = 0.05 with power > 0.8).
Target dataset: 10,000+ verified submissions (enables subgroup analysis by shuffle technique, geographic region, and longitudinal analysis for repeat participants).
Pairwise comparisons scale: n submissions produce n(n-1)/2 pairwise comparisons. At 10,000 submissions, this yields ~50 million pairs — sufficient for precise estimation of the empirical match distribution.
Participants are recruited through:
No compensation is provided. Participation is motivated by:
Each submission follows a mandatory three-phase video recording:
Purpose: Prove the deck is complete and establish the starting order.
Instructions: The participant fans all 52 cards face-up toward the camera.
Captured data: Starting order of the deck (pre-shuffle permutation).
Integrity function: Verifies a complete 52-card deck is present before shuffling begins.
Purpose: The participant shuffles the deck under time constraints insufficient for achieving full randomness.
Instructions: On-screen prompt instructs "Keep shuffling until you see STOP." The duration is randomised between 4 and 8 seconds per session.
Rationale for brief, randomised duration: (1) Time constraints create realistic, incomplete shuffling conditions representative of casual shuffling scenarios, (2) Randomised duration prevents rehearsal of fixed-length shuffle routines, (3) Varying duration reduces the ability to pre-arrange outcomes.
Expected outcome: The brief (4–8 second) duration is deliberately insufficient for achieving uniform randomness, creating an empirical study of human shuffle quality under realistic time constraints.
Captured data: Exact shuffle duration (randomized value) and shuffle technique (observable from video for post-hoc classification).
Purpose: Record the final shuffled order.
Instructions: The participant holds the deck face-down and reveals each card one at a time by removing the top card, showing its face to the camera.
Captured data: The 52-card permutation in the order revealed.
For each submission, the following is captured and stored:
| Field | Type | Source | Purpose |
|---|---|---|---|
| Submission ID | UUID | System-generated | Unique identifier |
| Sequential number | Integer | Auto-increment | Human-readable submission number |
| User ID | UUID | Auth system | Links to participant (pseudonymised) |
| Video file | Binary (WebM/MP4) | Client camera | Primary evidence, audit trail |
| Snapshot frame | Binary (JPEG) | Last video frame | Backup / quick reference |
| AI-extracted card order | Integer[52] | AI pipeline | Raw machine reading |
| AI confidence score | Float (0–1) | AI pipeline | Per-submission confidence |
| User-verified card order | Integer[52] | Human verification | Ground truth permutation |
| Cards corrected by user | Integer[52] boolean map | Diff of AI vs user | AI accuracy measurement |
| Number of corrections | Integer | Computed | Summary metric |
| Shuffle phase duration | Float (seconds) | Video timestamps | Shuffle duration variable |
| Submission timestamp | ISO 8601 | System clock | Temporal data |
| Device fingerprint | Hash | Client-side | Anti-fraud |
| IP address (hashed) | Hash | Server-side | Anti-fraud, geo approximation |
| User agent | String | HTTP header | Device/browser metadata |
Model: YOLOv8s (You Only Look Once v8 Small) — playing cards fine-tuned
Architecture: Dual-stage detection system
/card-detection/server.py) for post-processing verificationInput: Video file + extracted frames from Phase 3 (reveal)
Output: JSON array of 52 card codes in reveal order, plus per-card confidence scores
Cost: $0 (self-hosted open-source model)
The detection pipeline:
{
"cards": ["KC", "7H", "AS", "2D", ...], // 52 card codes
"confidence": [0.95, 0.88, 0.92, ...], // per-card confidence (0–1)
"overall_confidence": 0.91 // average confidence
}
Card code format: {Rank}{Suit} where Rank ∈ {A, 2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K} and Suit ∈ {S, H, D, C}.
Detection accuracy is measured through post-submission validation (duplicate/missing card detection). We track:
This data is preserved as a secondary dataset for computer vision research.
All detection outputs are stored with:
If the detection pipeline is updated (model retrained, architecture change), all submissions retain their original detection data. Re-detection with improved models is logged as a separate record, not an overwrite.
↑ Back to topAfter computer vision detection, the participant is presented with a 4×13 visual grid displaying the detected card order. The participant:
Critical anti-fraud design: Users cannot edit individual cards. This prevents manipulation of the final order to achieve desired match scores. The detection is either accepted as-is or the entire submission is rejected.
The confirmed detection becomes the authoritative permutation stored as ground truth.
Even after participant confirmation, the system performs automated validation to catch detection errors:
If the confirmed order contains the same card multiple times (e.g., two King of Spades):
If fewer than 52 unique cards are detected or any standard card is missing:
Only submissions with exactly 52 unique standard cards proceed to the database. Rejected submissions are logged separately with full data (video, detection output, rejection reason).
Rationale: This post-submission validation compensates for the inability to verify deck completeness in Phase 1. Combined with no-edit confirmation, it ensures only valid 52-card permutations enter the research dataset while preventing user manipulation.
Concern: Could detection errors compromise data quality?
Mitigations:
Design philosophy: Prioritize fraud prevention over detection accuracy. A few misdetected cards that users must reject and re-submit is preferable to allowing users to edit cards and manipulate match scores.
If detection fails to identify cards with sufficient confidence:
Note: Users cannot edit individual cards, only accept or reject the entire detection.
↑ Back to topEach verified permutation σ (where σ(i) = card at position i, with i = 0..51 representing the sorted deck order) is analysed for:
Mean absolute displacement from the identity permutation:
Expected value under uniform random: E[D] ≈ 17.5
Interpretation: Higher = more displaced from original order = better shuffle
Count of positions where σ(i) = i (card remains in original position):
Expected value under uniform random: E[F] = 1.0
Distribution: Approximately Poisson(1) for large n
Interpretation: More fixed points = worse shuffle
Number of pairwise inversions from the identity:
Expected value under uniform random: E[K] = n(n-1)/4 = 663
Range: 0 (identity) to 1,326 (reverse)
Interpretation: Closer to 663 = more random
Count of maximal ascending consecutive subsequences:
Expected value under uniform random: E[R] = (n+1)/2 = 26.5
Interpretation: Closer to 26.5 = more random; near 1 = nearly sorted
Length of the longest strictly increasing subsequence (not necessarily contiguous).
Expected value under uniform random: E[LAS] ≈ 2√n ≈ 14.4
Interpretation: Shorter = more random; near 52 = nearly sorted
Decomposition of σ into disjoint cycles. We record:
Expected number of cycles under uniform random: H52 ≈ 4.56 (52nd harmonic number)
Mean distance between consecutive appearances of cards from the same suit.
Expected value under uniform random: E[SC] = 52/13 = 4.0
Interpretation: Near 4.0 = well-distributed suits; near 1.0 = highly clustered
A normalised composite score combining all metrics, scaled so that:
Computed by converting each metric to a percentile rank against 100,000 simulated uniform random permutations, then averaging the percentiles.
For every pair of submissions (σ1, σ2):
M(σ1, σ2) = |{i : σ1(i) = σ2(i)}|
This is the primary comparison metric and the basis for the "closest pair" record.
Length of the longest contiguous subsequence where σ1 and σ2 agree.
Number of pairwise disagreements between σ1 and σ2 (not relative to identity, but to each other).
| Data | Storage | Retention | Access |
|---|---|---|---|
| Video files | DigitalOcean Spaces (S3-compatible) | Indefinite | Private; accessible only via presigned URLs |
| Snapshot images | DigitalOcean Spaces | Indefinite | Private |
| Permutation data | PostgreSQL | Indefinite | Application + research export |
| AI raw responses | PostgreSQL (JSONB) | Indefinite | Application |
| User corrections | PostgreSQL | Indefinite | Application + research export |
| Metadata | PostgreSQL | Indefinite | Application + research export |
Every state change on a submission is logged:
| Threat | Risk | Mitigation |
|---|---|---|
| Pre-arranged deck (not actually shuffled) | High | Randomised shuffle duration, video evidence, statistical detection |
| Gallery upload (pre-recorded/fabricated video) | Medium | Camera-only capture via MediaRecorder API, no file input |
| Editing the verified order to fake a match | Medium | Video retained for re-verification, corrections logged as diff |
| Multiple accounts submitting same arrangement | Low | Card order hash deduplication, device fingerprinting |
| Automated/bot submissions | Low | Rate limiting, device fingerprinting, CAPTCHA (if needed) |
Flagged submissions are:
| Tier | Criteria | Use |
|---|---|---|
| Gold | Detection confidence ≥ 0.9, randomness score ≥ 50 | Primary dataset, all analyses |
| Silver | Detection confidence ≥ 0.7 | Primary dataset with annotation |
| Bronze | Detection confidence < 0.7 | Secondary dataset, detection accuracy analysis only |
| Flagged | Triggered anomaly detection | Excluded from primary dataset |
Before first submission, participants are presented with a consent screen explaining:
Consent is recorded with a timestamp. Participation cannot proceed without consent.
Participants may at any time:
Deletion requests are honoured within 30 days.
The platform does not knowingly collect data from users under 13. Age verification is not actively enforced beyond the terms of service.
This protocol has been designed to support submission to an Institutional Review Board or ethics committee if required for academic collaboration. Key considerations:
Funding Model: This research is independently funded through a hybrid sustainability model:
Independence Statement:
Conflicts of Interest: None declared. The research team has no financial relationships with card manufacturers, casino operators, or gaming companies that could bias study design or results.
Transparency: All funding sources and revenue models are disclosed in participant consent, privacy policy, and published academic papers.
↑ Back to topCollection → Extraction → Verification → Storage → Analysis → Publication
↓ ↓ ↓ ↓ ↓ ↓
Video AI output User review PostgreSQL Internal Anonymised
upload + confidence + corrections + Spaces analytics dataset
When the dataset reaches sufficient size (target: 1,000+ Gold-tier submissions), we will publish:
The dataset uses a tiered licensing model to balance open science principles with responsible data stewardship:
License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Permissions:
Rationale: This license maximizes research accessibility while preventing commercial exploitation. The ShareAlike clause ensures derivative datasets remain openly available to the research community.
Access Model: Controlled Access via Data Use Agreement (DUA)
Requirements:
Access Decisions: The research team reserves the right to:
Rationale: The Secondary Dataset contains detailed technical metadata that, while anonymized, warrants controlled access to ensure responsible use and prevent potential misuse in adversarial machine learning contexts (e.g., training systems to defeat card detection).
Commercial entities seeking access (e.g., game developers, EdTech platforms, AI training) must:
Commercial licensing revenue, if any, will be reinvested in research infrastructure and dataset expansion.
While open science is a core principle, responsible data stewardship requires balancing accessibility with:
This tiered approach is standard practice in large-scale research datasets (e.g., UK Biobank, Human Connectome Project, ImageNet).
↑ Back to topParticipants self-select. The sample is not representative of the general population. It skews toward:
Participants know their shuffle is being measured and compared. This may lead to:
This is acknowledged and discussed in any analysis. It does not invalidate the data — it characterises "shuffling when trying."
The starting order of each participant's deck is unknown (unless identifiable from the fan phase). Most participants' decks will not start in sorted order — they are already in some shuffled state from prior use.
Standard decks from different manufacturers may vary in card size, finish, face design, and wear, affecting shuffle mechanics and AI recognition.
The YOLOv8 detection pipeline is not perfect. The confirmation-only verification system means:
We report detection accuracy metrics alongside the dataset and maintain video files for potential re-detection with improved models.
Video quality varies across devices and environments. Poor lighting, low resolution, or shaky video reduces detection accuracy and may require participants to re-submit.
↑ Back to topThis protocol document is version-controlled and timestamped. Any amendments are recorded in the revision history.
The analysis code, scoring algorithms, and comparison engine are open source and available in the project repository. Any researcher can:
The published dataset includes sufficient raw data (52-integer permutations) for any researcher to independently compute all derived metrics.
All comparisons against "uniform random" use reproducible simulations with documented seeds. The simulation code and parameters are published alongside the dataset.
↑ Back to top| Field | Type | Description |
|---|---|---|
| submission_id | Integer | Sequential submission number (1, 2, 3, ...) |
| card_order | Integer[52] | The verified permutation. Each integer 0–51 encodes a card. |
| quality_tier | Enum | gold, silver, bronze |
| detection_confidence | Float | Detection model confidence score (0–1) |
| shuffle_duration_seconds | Float | Duration of shuffle phase |
| randomness_score | Float | Composite randomness score (0–100) |
| displacement_mean | Float | Mean absolute displacement from reference order |
| fixed_points | Integer | Cards in original position |
| inversion_count | Integer | Kendall tau distance from identity |
| ascending_runs | Integer | Number of ascending runs |
| longest_ascending_subseq | Integer | Length of longest ascending subsequence |
| cycle_count | Integer | Number of cycles in permutation |
| longest_cycle | Integer | Length of longest cycle |
| suit_clustering_index | Float | Mean inter-suit-card distance |
| submission_date | Date | Date of submission (no time) |
| closest_match_count | Integer | Exact position matches with nearest other submission |
| closest_match_id | Integer | Submission ID of nearest match |
| ID Range | Suit | Cards |
|---|---|---|
| 0–12 | Spades (S) | A♠, 2♠, 3♠, ..., K♠ |
| 13–25 | Hearts (H) | A♥, 2♥, 3♥, ..., K♥ |
| 26–38 | Diamonds (D) | A♦, 2♦, 3♦, ..., K♦ |
| 39–51 | Clubs (C) | A♣, 2♣, 3♣, ..., K♣ |
Within each suit: 0=Ace, 1=Two, 2=Three, ..., 9=Ten, 10=Jack, 11=Queen, 12=King.
↑ Back to top| Term | Definition |
|---|---|
| Permutation | An ordered arrangement of all 52 cards. Represented as σ where σ(i) = card in position i. |
| Identity permutation | The sorted deck order (A–K of Spades, A–K of Hearts, A–K of Diamonds, A–K of Clubs). |
| Reference order/position | The baseline against which displacement and fixed points are measured. If the starting deck order is identifiable from the fan phase, the reference is that starting order. Otherwise, the identity permutation (sorted order) serves as a consistent reference frame. See Section 14.3. |
| Fixed point | A position i where σ(i) = i; the card did not move from its reference position. |
| Displacement | The absolute distance a card moved from its reference position: |σ(i) - i|. |
| Inversion | A pair (i, j) where i < j but σ(i) > σ(j). Measures disorder. |
| Kendall tau distance | Total number of inversions in a permutation. |
| Ascending run | A maximal contiguous increasing subsequence within the permutation. |
| Cycle | In permutation theory, a subset of elements that map to each other in a closed loop under the permutation. |
| Derangement | A permutation with zero fixed points. |
| Poisson(1) | A probability distribution with mean 1, modelling the expected number of fixed points between two random permutations. |
| Version | Date | Changes |
|---|---|---|
| 1.0 | 2026-02-13 | Initial protocol document |