
Question:
i'm trying to recognize <a href="http://www.worldofmunchkin.com/moregoodcards/img/cards.jpg" rel="nofollow">munchkin cards</a> from the card game. i've been trying to use a variety of image recognition APIs(google vision api, vize.ai, azure's computer vision api and more), but none of them seem to work ok.<br /> they're able to recognize one of the cards when only one appears in the demo image, but when both appear with another one it fails to identify one or the other.<br /> i've trained the APIs with a set of about 40 different images per card, with different angles, backgrounds and lighting.<br /> i've also tried using ocr(via google vision api) which works only for some cards, probably due to small letters and not much details on some cards. Does anyone know of a way i can teach one of these APIs(or another) to read these cards better? or perhaps recognize cards in a different way?
the outcome should be a user capturing an image while playing the game and have the application understand which cards he has in front of him and return the results.<br /> thank you.
Answer1:You can try this: <a href="https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/quickstarts/csharp#OCR" rel="nofollow">https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/quickstarts/csharp#OCR</a>. It will detect text and then you can have your custom logic (based on detected text) to handle actions.
Answer2:You are going to wrong direction. As i understand, you have an image. And inside that image, there are several munchkin cards (2 in your example). It is not just only "Recognition" but also "Card detection" is needed. So your task should be divided into card detection task and card's text recognition task
For each task you can use the following algorithm
1. Card detection task
Simple color segmentation
( if you have enough time and patient, train SSD to detect card)
2. Card's text recognition
Use tesseract with english dictionary
(You could add some card rotating process to improve accuracy)
Hope that help
Answer3:What a coincidence! I've recently done something very similar – <a href="https://twitter.com/LinguaBrowse/status/1044578942559039488" rel="nofollow">link to video</a> – with great success! Specifically, I was trying to recognise and track Chinese-language Munchkin cards to replace them with English ones. I used iOS's ARKit 2 (requires an iPhone 6S or higher; or a relatively new iPad; and isn't supported on desktop).
I basically just followed the Augmented Reality Photo Frame demo 41 minutes into WWDC 2018's <a href="https://developer.apple.com/videos/play/wwdc2018/602/" rel="nofollow">What's New in ARKit 2</a> presentation. My code below is a minor adaptation to theirs (merely replacing the target with a static image rather than a video). The tedious part was scanning all the cards in both languages, cropping them out, and adding them as AR resources...
Here's my source code, ViewController.swift
:
import UIKit
import SceneKit
import ARKit
import Foundation
class ViewController: UIViewController, ARSCNViewDelegate {
@IBOutlet var sceneView: ARSCNView!
override func viewDidLoad() {
super.viewDidLoad()
var videoPlayer: AVPlayer
// Set the view's delegate
sceneView.delegate = self
// Show statistics such as fps and timing information
sceneView.showsStatistics = true
sceneView.scene = SCNScene()
}
override func viewWillAppear(_ animated: Bool) {
super.viewWillAppear(animated)
// Create a configuration
let configuration = ARImageTrackingConfiguration()
guard let trackingImages = ARReferenceImage.referenceImages(inGroupNamed: "card_scans", bundle: Bundle.main) else {
print("Could not load images")
return
}
// Setup configuration
configuration.trackingImages = trackingImages
configuration.maximumNumberOfTrackedImages = 16
// Run the view's session
sceneView.session.run(configuration)
}
override func viewWillDisappear(_ animated: Bool) {
super.viewWillDisappear(animated)
// Pause the view's session
sceneView.session.pause()
}
// MARK: - ARSCNViewDelegate
// Override to create and configure nodes for anchors added to the view's session.
public func renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) -> SCNNode? {
let node = SCNNode()
if let imageAnchor = anchor as? ARImageAnchor {
// Create a plane
let plane = SCNPlane(width: imageAnchor.referenceImage.physicalSize.width,
height: imageAnchor.referenceImage.physicalSize.height)
print("Asset identified as: \(anchor.name ?? "nil")")
// Set UIImage as the plane's texture
plane.firstMaterial?.diffuse.contents = UIImage(named:"replacementImage.png")
let planeNode = SCNNode(geometry: plane)
// Rotate the plane to match the anchor
planeNode.eulerAngles.x = -.pi / 2
node.addChildNode(planeNode)
}
return node
}
func session(_ session: ARSession, didFailWithError error: Error) {
// Present an error message to the user
}
func sessionWasInterrupted(_ session: ARSession) {
// Inform the user that the session has been interrupted, for example, by presenting an overlay
}
func sessionInterruptionEnded(_ session: ARSession) {
// Reset tracking and/or remove existing anchors if consistent tracking is required
}
}
<a href="https://i.stack.imgur.com/Brm51.jpg" rel="nofollow"><img alt="Card detection and replacement" class="b-lazy" data-src="https://i.stack.imgur.com/Brm51.jpg" data-original="https://i.stack.imgur.com/Brm51.jpg" src="https://etrip.eimg.top/images/2019/05/07/timg.gif" /></a>
Unfortunately, I met a limitation: card recognition becomes rife with false positives the more cards you add as AR targets to distinguish from (to clarify: not the number of targets simultaneously onscreen, but the library size of potential targets). While a 9-target library performed with 100% success rate, it didn't scale to a 68-target library (which is all the Munchkin treasure cards). The app tended to flit between 1-3 potential guesses when faced with each target. Seeing the poor performance, I didn't take the effort to add all 168 Munchkin cards in the end.
I used Chinese cards as the targets, which are all monochrome; I believe it could have performed better if I'd used the English cards as targets (as they are full-colour, and thus have richer histograms), but on my initial inspection of a 9-card set in each language, I was receiving as many warnings for the AR resources being hard to distinguish for English as I was for Chinese. So I don't think the performance would improve so far as to scale reliably to the full 168-card set.
Unity's Vuforia would be another option to approach this, but again has a hard limit of 50-100 targets. With (an eye-wateringly expensive) commercial licence, you can delegate target recognition to cloud computers, which could be a viable route for this approach.
Thanks for investigating the OCR and ML approaches – they would've been my next ports of call. If you find any other promising approaches, please do leave a message here!