"A meek endeavor to the triumph" by Sampath Jayarathna

Monday, November 15, 2010

Reading #28 iCanDraw

COMMENTS ON OTHERS:

           Francisco  

SUMMARY

              Proposed paper describes a feedback assisting system for users to draw a human face from an image. The work starts with a face recognition system to model the features of a human face from the image, and then sketch recognition to evaluate hand-drawn face and giving corrective feedback and receiving actions from the user. The teaching style of the proposed model is step-by-step instructions complemented by corrective feedback to assist a user towards creating an accurate rendition of an image. The user interface consists of a drawing area, a reference image to draw , and an area to provide instructions and options to the user.

            A reference image was manipulated for each step to help the user to see what to draw. This includes reference lines across horizontal and vertical image sections. The corrected feedback is provided with using text markers as well as visual guidelines. 

DISCUSSION

            I’m just wondering how this particular method can help on our project3 work where a corrective feedback system to draw a human eye. What other ways the current implementation can modify from its template base to some other way to compare the users sketch? I’m thinking may be Paleo is some sort of a help here? It can give you the type of the sketch the user is drawing and based on this we can decide upon a particular sketch from the required sketch is completed or not!

Reading #18: Spatial Recognition and Grouping of Text and Graphics

COMMENTS ON OTHERS:

            Jonathan 

SUMMARY

              The paper proposes a spatial framework for simultaneous grouping and recognition of shapes and symbols in free-form ink diagrams. The recognition is done by linking each stroke into a proximity graph and then using a discriminative classifier to classify connected subgraphs as either known symbol or invalid combination of strokes. In the preprocessing the graph is created with the nodes corresponds to strokes and edges to the strokes in close proximity. Then a dynamic programming approach was used to iterate on nodes and discriminated recognition was applied on the set. The classifier is named as AdaBoost and features are the Viola-Jones image filters to evaluate each stroke group. Authors also note that it is possible to input strokes which may not make up a shape at all called garbage shapes. 

DISCUSSION

            The paper is extremely short and not enough information provided with the recognizer, and in this case the most important part of this work. Much of the discussions in first few chapters are about the previous work, the preprocessing and the search tree on the neighborhood graph making process. The authors mentioned a use of dynamic programming in the content somewhere but not sure A* is the best method to get the optimum results. In my opinion the sketches tend to represent more of a close loop connection of strokes from start to end and this makes it easy to find an optimum path using a forward backward algorithm or using Viterbi.

Sunday, November 14, 2010

Reading #17: Distinguishing Text from Graphics in On-line Handwritten Ink

COMMENTS ON OTHERS:

             Wenzhe Li  

SUMMARY

              The authors states that the proposed study utilizes both characteristics of sketch strokes and also information provided by the gaps between the strokes, and temporal characteristics of the stroke sequence. The proposed approach is based on the use of discriminative machine learning techniques to infer the class of the stroke based on the observed ink.

            The proposed model starts with considering the isolated strokes and extracting features then training probabilistic classifier based on feed forward neural network. After that, the model is augmented with temporal information to capture the correlation between class labels. Finally it is proposed that to consider information extracted from the gaps between successive strokes. The model said to be based on 9 features extracted at the independent stroke model. 

DISCUSSION

            The proposed work uses a HMM model to enhance the performance by taking the context for each stroke into account. It is stated that the focus is on the use of temporal context since it leads to a 1D inference problem which can solve efficiently using dynamic programming techniques. The model uses Viterbi algorithm to find the states with the training data to get the most probably selection of the strokes. It is also obvious that the HMM require initial probabilistic information for the HMM model, and it is not clear from the context that how the initial probably set is chosen to train the system. Whether authors used a dummy set of probabilities or heuristic probably was used not really clear.

Reading #16: An Efficient Graph-Based Symbol Recognizer

COMMENTS ON OTHERS:

            JJ

SUMMARY

              The paper proposes a symbol recognizer for pen based user interfaces which the symbols are attributed to relational graphs and best matches of the unknown symbol based on that. The graph matching is done using four techniques, stochastic matching, error driven matching, greedy matching and sort matching.  In an attributed relational graph (ARG) each node represents the geometric primitives, and the edges represent the geometric relationship between them. A definition for a symbol is created by constructing an average ARG from a set of training samples. The proposed recognition applies the ARG of unknown symbol to compare it with ARG of each definition symbol to find a best match based on above defined graph matching techniques. 

DISCUSSION

             The authors claim that stochastic, error-driving matching, and greedy are search based classifiers and Sort is based on orientation fixed method. This means that the other 3 methods are orientation invariant? But the figure 3 actually presents a problem with searching where the orientation brings a dissimilar pair into the matching at the comparison.

Reading #15: An Image-Based, Trainable Symbol Recognizer for Hand-drawn Sketches

COMMENTS ON OTHERS:

           JJ  

SUMMARY

              The paper proposes a trainable, hand-drawn symbol recognizer based on multi layer recognition scheme with the symbols internal representation on a binary template. Ensembles of four different classifiers are used o rank symbols based on similarity to an unknown symbol and the scores are aggregated to produce a combined score. The best score is assigned to the unknown symbol. All four classifiers are template matching techniques to compute the similarity between symbols. The authors used a polar coordinate based technique to compensate the rotation sensitiveness of the template matching technique. The authors state that the proposed system is particularly useful for sketchy inputs like heavy over stroking and erasing due to its binary template approach. 

DISCUSSION

             The authors state that the binary template approach is useful in sketchy overly stroked and erased sketches, but the down sampling and framing it to a 48x48 is questionable whether the best approach. May be authors should experiment more with other techniques like binarizing the ink features and then getting a skeleton of the bits which will actually preserve the sketch input than reducing it.

Saturday, November 13, 2010

Reading #14. Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams

COMMENTS ON OTHERS:

            Jianjie (JJ) Zhang. 

SUMMARY

            The paper proposes a method to distinguish between shape and text strokes based on entropy rate. The authors state that the entropy rate is significantly higher for text strokes compared to shape strokes. They propose a single feature-zero-order entropy rate with a correct classification rate of 92.06% with a trained threshold. The zero-order entropy is determined independently from the previous symbol of the sketch point. The symbol refers to an associated symbol value for each point’s angle between two points of a text or a shape. 

DISCUSSION

             The authors argued that using an arbitrary set of features and trial and error approach is time consuming, which I totally agreed. I had the same in mind when reading the Ink feature based recognition on statistical analysis. Time is something to worry in a sketch system with both text and shapes, not only to distinguish them but also to recognize them from the library sketches.  The proposed system deems to be a simpler way to achieve the text and shape classification, but its still not the best solution when it comes to complex shapes, which quite similar in entropy even to a simple character set.

Reading #13. Ink Features for Diagram Recognition

COMMENTS ON OTHERS:

            Francisco Vides 

SUMMARY

             The paper proposes to use formal statistical analysis methods to identify key ink features to improve recognition. The features measure aspects of an ink stroke’s curvature, size, time, intersections and use similar aspects to detect relationships between strokes. The proposed approach begins with investigating a range of possible ink features, how to collect these feature data and analysis, and initial result of an evaluation of a text/shape divider based on these key ink feature set. The proposed feature set includes 46 features and grouped into 7 categories of size, time, intersection, curvature, pressure, operating system recognition values and inter-stroke gaps. 

DISCUSSION

            Am I missing it or the 46 feature set is not actually listed on the paper or what? Also I’m wondering this particular technique is just to divide the whole sketch system into 2 groups, text or shape, and not to identify further each individual component against a library component? The authors state that they concentrated on identifying the distinguishing features of text versus shape strokes using a formal method for optimal ink feature selection; this means only identifying those stated 2 groups? I kow this is something important when we have a system like COA where possibly a text and shapes mixed with each other, but how feasible approach this is when recognizing something within a time limit say like sixty seconds? We have a recognition engine to separate text and shape and then again apply a recognition engine for each shape and text separately?