recognize playing cards in an image


i'm trying to recognize <a href="http://www.worldofmunchkin.com/moregoodcards/img/cards.jpg" rel="nofollow">munchkin cards</a> from the card game. i've been trying to use a variety of image recognition APIs(google vision api, vize.ai, azure's computer vision api and more), but none of them seem to work ok.<br /> they're able to recognize one of the cards when only one appears in the demo image, but when both appear with another one it fails to identify one or the other.<br /> i've trained the APIs with a set of about 40 different images per card, with different angles, backgrounds and lighting.<br /> i've also tried using ocr(via google vision api) which works only for some cards, probably due to small letters and not much details on some cards. Does anyone know of a way i can teach one of these APIs(or another) to read these cards better? or perhaps recognize cards in a different way?

the outcome should be a user capturing an image while playing the game and have the application understand which cards he has in front of him and return the results.<br /> thank you.


You can try this: <a href="https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/quickstarts/csharp#OCR" rel="nofollow">https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/quickstarts/csharp#OCR</a>. It will detect text and then you can have your custom logic (based on detected text) to handle actions.


You are going to wrong direction. As i understand, you have an image. And inside that image, there are several munchkin cards (2 in your example). It is not just only "Recognition" but also "Card detection" is needed. So your task should be divided into card detection task and card's text recognition task

For each task you can use the following algorithm

1. Card detection task Simple color segmentation ( if you have enough time and patient, train SSD to detect card) 2. Card's text recognition Use tesseract with english dictionary (You could add some card rotating process to improve accuracy)

Hope that help


What a coincidence! I've recently done something very similar – <a href="https://twitter.com/LinguaBrowse/status/1044578942559039488" rel="nofollow">link to video</a> – with great success! Specifically, I was trying to recognise and track Chinese-language Munchkin cards to replace them with English ones. I used iOS's ARKit 2 (requires an iPhone 6S or higher; or a relatively new iPad; and isn't supported on desktop).

I basically just followed the Augmented Reality Photo Frame demo 41 minutes into WWDC 2018's <a href="https://developer.apple.com/videos/play/wwdc2018/602/" rel="nofollow">What's New in ARKit 2</a> presentation. My code below is a minor adaptation to theirs (merely replacing the target with a static image rather than a video). The tedious part was scanning all the cards in both languages, cropping them out, and adding them as AR resources...

Here's my source code, ViewController.swift:

import UIKit import SceneKit import ARKit import Foundation class ViewController: UIViewController, ARSCNViewDelegate { @IBOutlet var sceneView: ARSCNView! override func viewDidLoad() { super.viewDidLoad() var videoPlayer: AVPlayer // Set the view's delegate sceneView.delegate = self // Show statistics such as fps and timing information sceneView.showsStatistics = true sceneView.scene = SCNScene() } override func viewWillAppear(_ animated: Bool) { super.viewWillAppear(animated) // Create a configuration let configuration = ARImageTrackingConfiguration() guard let trackingImages = ARReferenceImage.referenceImages(inGroupNamed: "card_scans", bundle: Bundle.main) else { print("Could not load images") return } // Setup configuration configuration.trackingImages = trackingImages configuration.maximumNumberOfTrackedImages = 16 // Run the view's session sceneView.session.run(configuration) } override func viewWillDisappear(_ animated: Bool) { super.viewWillDisappear(animated) // Pause the view's session sceneView.session.pause() } // MARK: - ARSCNViewDelegate // Override to create and configure nodes for anchors added to the view's session. public func renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) -> SCNNode? { let node = SCNNode() if let imageAnchor = anchor as? ARImageAnchor { // Create a plane let plane = SCNPlane(width: imageAnchor.referenceImage.physicalSize.width, height: imageAnchor.referenceImage.physicalSize.height) print("Asset identified as: \(anchor.name ?? "nil")") // Set UIImage as the plane's texture plane.firstMaterial?.diffuse.contents = UIImage(named:"replacementImage.png") let planeNode = SCNNode(geometry: plane) // Rotate the plane to match the anchor planeNode.eulerAngles.x = -.pi / 2 node.addChildNode(planeNode) } return node } func session(_ session: ARSession, didFailWithError error: Error) { // Present an error message to the user } func sessionWasInterrupted(_ session: ARSession) { // Inform the user that the session has been interrupted, for example, by presenting an overlay } func sessionInterruptionEnded(_ session: ARSession) { // Reset tracking and/or remove existing anchors if consistent tracking is required } }

<a href="https://i.stack.imgur.com/Brm51.jpg" rel="nofollow"><img alt="Card detection and replacement" class="b-lazy" data-src="https://i.stack.imgur.com/Brm51.jpg" data-original="https://i.stack.imgur.com/Brm51.jpg" src="https://etrip.eimg.top/images/2019/05/07/timg.gif" /></a>

Unfortunately, I met a limitation: card recognition becomes rife with false positives the more cards you add as AR targets to distinguish from (to clarify: not the number of targets simultaneously onscreen, but the library size of potential targets). While a 9-target library performed with 100% success rate, it didn't scale to a 68-target library (which is all the Munchkin treasure cards). The app tended to flit between 1-3 potential guesses when faced with each target. Seeing the poor performance, I didn't take the effort to add all 168 Munchkin cards in the end.

I used Chinese cards as the targets, which are all monochrome; I believe it could have performed better if I'd used the English cards as targets (as they are full-colour, and thus have richer histograms), but on my initial inspection of a 9-card set in each language, I was receiving as many warnings for the AR resources being hard to distinguish for English as I was for Chinese. So I don't think the performance would improve so far as to scale reliably to the full 168-card set.

Unity's Vuforia would be another option to approach this, but again has a hard limit of 50-100 targets. With (an eye-wateringly expensive) commercial licence, you can delegate target recognition to cloud computers, which could be a viable route for this approach.

Thanks for investigating the OCR and ML approaches – they would've been my next ports of call. If you find any other promising approaches, please do leave a message here!


  • MSSQL - JPA - Character encoding for Special characters - appending 'N' nativeQuery
  • Run-time Error 424 Object Required UserForm doesnt exist
  • document.querySelector with dynamically created content [duplicate]
  • Android Use Non-Gregorian Calendars
  • How to easily inspect styled-components using dev tools?
  • View current opened netNamedPipe channels?
  • ServiceStack not rendering Razor View, only seeing Snap Shot
  • How to handle Twilio postbacks in Cloud9 IDE dev server?
  • OneToOne bidirectional mapping foreign key auto fill
  • DirectX game with no prerequisite software to run
  • Parsing pair of strings fails. Bad spirit x3 grammar
  • Change color of row programmatically in WatchKit
  • add new field to form with rvest
  • Why does NotifyIcon not set SynchronizationContext?
  • How to not need user input for install.packages(type = “both”)
  • Selection Sort, For Java
  • Perl keyword say is not working in version 5.14.4
  • What dll is needed for Windows.Devices.Geolocation?
  • Cassandra: What is a subcolumn
  • Custom locale in Android
  • composer dependency stating in doesn't have php-xsl
  • Stitching 2 images (OpenCV)
  • CFBundleDevelopmentRegion not works as expected
  • Is there a way to choose which files are displayed to the user via the standard OPENFILE dialogs?
  • python: forcing relative imports to search from script file
  • Facebook iOS SDK Not Calling Completion Handler
  • Possible to get mouse events fired when cursor is outside page?
  • JPA flush vs commit
  • SAXReader not re-ecape characters
  • Jenkins: FATAL: Could not initialize class hudson.util.ProcessTree$UnixReflection
  • Firefox Extension - Monitor refresh and change of tab
  • What does 'Language neutral' mean with regard to MAKELANGID?
  • Android activity accessing service's static reference before the service is ready
  • Custom Tabgroup Appcelerator
  • Switching to Release Build causes runtime error in Web Reference
  • Checking free space on FTP server
  • JavaScriptCore crash on iOS9
  • R: gsub and capture
  • AT Commands to Send SMS not working in Windows 8.1
  • Does armcc optimizes non-volatile variables with -O0?