"A meek endeavor to the triumph" by Sampath Jayarathna

Tuesday, December 14, 2010

Pictures of the Day - for some end term fun.......

Sri Lankan police!


Results of Love Marriage..........Its always the kids that suffer. His name is Zonky!!!!


Thursday, December 09, 2010

Reading #30: Tahuti: A Geometrical Sketch Recognition System for UML Class Diagrams

COMMENTS ON OTHERS:

            Jonathan 

SUMMARY

              Tahuti, a multi-stroke sketch recognition environment for class diagrams in UML where users can sketch the diagrams on a tablet or white- board in the same way they would on paper and the sketches are interpreted by the computer. Proposed system differs from graffiti-based approaches to this task in that it allows users to drawn an object as they would with pen and paper. The system recognizes objects based on their geometrical properties by examining the line segments’ angles, slopes, and other properties, rather than requiring the user to draw the objects in a pre-defined manner. Recognizing the objects by their geometrical properties gives users the freedom to sketch and edit diagrams as they would naturally, while maintaining a high level of recognition accuracy. Proposed system uses a multi-layer framework for sketch recognition. The multi-layer framework allows the system to recognize multi-stroke objects by their geometrical properties. The stages of the multi-layer recognition framework are: 1) Preprocessing 2) Selection 3) Recognition 4) Identification. After each stroke is drawn, rudimentary processing is performed on the stroke, reducing the stroke to an ellipse or a series of line and curve segments. A collection of spatially and temporally close strokes is chosen, and the line segments contained in the collection of strokes are then recognized as either an editing command or a viewable object.

            During the recognition stage, all stroke collections are examined to see if a particular stroke collection could be interpreted as a viewable object or an editing command. During the identification stage, a final interpretation is chosen, and a collection of strokes is identified as a viewable object or an editing command. All possible interpretations found in the recognition stage from the stroke collections are presented to the identification stage. The identification stage selects the final interpretation based on the following rules. 

DISCUSSION

            Tahuti combines the sketching freedom provided by paper sketches and the processing power available in an interpreted diagram. The system is based on a multi-layer recognition framework and recognizes objects by their geometrical properties, rather than requiring that the user draw the objects in a pre-defined manner. This system considers only groups of strokes that fall within a spatial bound. This spatial bound on its own may not be enough, especially for regions that contain many overlapping strokes. For example, if there are ten lines that all fall within a small region, to identify an arrow, the system still may have to try combinations of these lines.

Reading #29: Scratch Input Creating Large, Inexpensive, Unpowered and Mobile Finger Input Surfaces

COMMENTS ON OTHERS:

           Jonathan 

SUMMARY

              In this paper, the authors provide a new input technique that allows small devices to appropriate existing, large, passive surfaces such as desks and walls, for use as a kind of input device. This Scratch Input technique operates by listening to the sound of “scratching” (e.g., with a fingernail) that is transmitted through the surface material. This signal can be used to recognize a vocabulary of gestures carried out by the user. The proposed sensor is simple and inexpensive, and can be easily incorporated into mobile devices, enabling them to appropriate whatever solid surface they happen to be resting on. Alternately, it can be very easily deployed, for example, to make existing walls or furniture input-capable.

            To capture sound transmission through solid materials, authors proposed to use a modified stethoscope. This is particularly well suited to both amplifying sound and detecting high frequency noises. This is attached to a generic microphone, which converts the sound into an electrical signal. In this particular implementation, the signal is amplified and connected to a computer through the audio-input jack. Scratch Input’s non-spatial property gives it a significantly different character from many other surface input techniques and does preclude some uses. Results indicate participants were able to achieve an average accuracy of 89.5%. As hypothesized, accuracy suffered as gesture complexity grew. Gestures with two of fewer motions achieved accuracies in excess of 90%. 

DISCUSSION

            The Scratch Input, an acoustic-based finger input technique that can be used to create large, inexpensive and mobile finger input surfaces. This can allow mobile devices to appropriate surfaces on which they rest for gestural input. This revealed that Scratch Input is both easy to use and accurate on a variety of surfaces. Foremost, most mechanical sensors are engineered to provide relatively flat response curves over the range of frequencies that is relevant to signal. This is a desirable property for most applications where a faithful representation of an input signal – uncolored by the properties of the transducer – is desired. However, because only a specific set of frequencies is conducted through the arm in response to tap input, a flat response curve leads to the capture of irrelevant frequencies and thus to a high signal-to-noise ratio.

Reading #27: K-sketch: A 'Kinetic' Sketch Pad for Novice Animators

COMMENTS ON OTHERS:

            Francisco

SUMMARY

              The authors proposed an informal, 2D animation system called KSketch, the “Kinetic” Sketch Pad. K-Sketch is a pen-based system that relies on users’ intuitive sense of space and time while still supporting a wide range of uses. K-Sketch animations are often rough, but they are still useful in informal situations and as prototypes of formal animations. The goal of this project has not been to design novel interaction techniques but rather to focus on high-level choices about tool features. Thus, the authors conducted field studies to find out how an informal animation tool might be used and whether or not it could be made general-purpose.

From these interviews with nineteen animators and would be animators, authors compiled a library of 72 usage scenarios for an animation system. these results in more detail and describe a novel optimization technique that enable to make K-Sketch’s interface simultaneously fast, simple, and powerful.

            Process begins by reviewing interviews with animators and with non-animators. This is followed by an analysis of the library of usage scenarios collected and a description of interface optimization technique. Since many novice animators wish to do what experienced animators do, the authors began field studies by interviewing eight experienced animators to see how an informal tool would fit in their work process. K-Sketch currently supports all ten desired animation operations: Translate, Scale, Rotate, Set Timing, Move Relative, Appear, Disappear, Trace, Copy Motion, and Orient to Path. 

DISCUSSION

            These results show that K-Sketch’s simple interface has strong benefits. The simplicity of K-Sketch’s interface also meant less practice time was needed before tasks could be performed. These tools allow designers to build prototypes or storyboards of dynamic systems by creating sketches according to conventional visual languages.

Reading #26: Picturephone: A Game for Sketch Data Capture


COMMENTS ON OTHERS:

            Francisco 

SUMMARY

             This paper proposes a multi-player sketching game called Picturephone. Its purpose is to capture hand-drawn sketches and player-provided descriptions which can be used by other researchers to develop or test sketch recognition systems. Picturephone is not a new recognition system—it is a tool for capturing hand-made drawings in many domains by many people, along with human-classified descriptions. Picturephone is inspired by the children’s game called Telephone. In Telephone, a player privately describes something to the person to the left. That person then conveys the message to the person to their left, and so on. Over time the message may change drastically. Picturephone uses a web oriented client/server architecture and is known to run on Windows, Mac OS X, and Ubuntu Linux. Both client and server are written in Java. Communication is done with the standard HTTP protocol using the host web browser’s network connection, allowing the game to work unimpeded by firewall or router restrictions.

            There are three primary game modes: draw, describe, and rate. Players are randomly assigned one of these modes. In Draw mode (Figure 2), players are given a text description and are asked to draw it using the sketching surface at the right. A time limit is enforced to encourage simplicity. 

DISCUSSION

            Picturephone is the first instance of a class of planned sketching games that could provide researchers with a method to acquire data about drawings. This includes the physical act of sketching as well as how people describe those drawn elements. The paper is lack of most of its implementation details, and significantly overlaps with the paper described in Reading #24. Here it asks the player to draw using the description “three concentric circles”. After completing the drawing, the player hits ‘Done’. The game chooses among your preferred modes: sketching, describing, or rating.

The user gets a few ‘rate’ phases in a row here. It simply asks the player to rate how closely the two pictures match. The picture on the left was the basis for a description, and the one on the right was a sketch made based on that description. When a player rates these two drawings, points are assigned to the people who made both sketches as well as whoever made the mediating description. As you can see, the task of rating is rather subjective, but since it collects lots of ratings for the same pair it seems to work out fairly well. Eventually the game picks ‘describe’ mode, so here you see the player typing in a description about a floor plan layout of a square house with a bathroom and kitchen in the corners. This process continues until the player chooses to end their session.

Reading #25: A Descriptor for Large Scale Image Retrieval Based on Sketched Feature Lines

COMMENTS ON OTHERS:

            Jonathan

SUMMARY

              The main contribution of this work is a sketch-based query system for image databases containing millions of images. As most current retrieval algorithms for large image databases it is based on a small descriptor that captures essential properties of the images. A main feature is that it elegantly addresses the asymmetry between the binary user sketch on the one hand and the full color image on the other hand. The proposed descriptor is constructed in such a way that both the full color image and the sketch undergo exactly the same preprocessing steps to compute the descriptor. The resulting sketch based image retrieval system can be used by any novice user to quickly query the image database. The power of the system stems from exploiting the vast amount of existing images, which offsets obvious deficits in image descriptors and search.

The authors state that the descriptor’s performance is superior to a variant of the MPEG-7 edge histogram descriptor in a quantitative evaluation for which measured retrieval ranks of 27 sketches created from reference images out of the image database. 

DISCUSSION

            I’m totally perplexed. Is this related to sketch recognition? 

Reading #24: Games for Sketch Data Collection

COMMENTS ON OTHERS:

            Hong-Hoe (Ayden) Kim

SUMMARY

              This paper presents a multi-player sketching games to capture a data corpus of hand-drawn sketches and player-provided descriptions from many users on a wide range of subjects. Two systems with distinct game mechanics are described: Picturephone and Stellasketch. Picturephone has three primary game modes: draw, describe, and rate. Players are randomly assigned one of these modes. Stellasketch is a synchronous, multi-player sketching game similar to the parlor game Pictionary. One player is asked to make a drawing based on a secret clue. The other players see the drawing unfold as it is made and privately label the drawing. While Picturephone’s descriptions are meant to be used to recreate a drawing, Stellasketch’s labels simply state what the sketch depicts. Labels are timestamped, so they can be associated with sketches at various stages of completion.

The characteristics of the two games’ data differ. While Picturephone’s sketches are complete at the time when others describe them, a Stellasketch drawing is labeled as it is made. Furthermore, Picturephone descriptions are generally longer and in approximately complete sentences, but Stellasketch labels are often short noun-phrases. Because a Stellasketch drawing is labeled as it is made, players usually furnish multiple interpretations, and there is often significant agreement among players. Agreement indicates those interpretations are more ‘correct’. Sometimes labels cluster into more than one group. While Picturephone supports people to play at their own rate, a game of Stellasketch requires several people to play at the same rate. 

DISCUSSION

            This paper has presented Picturephone and Stellasketch, two sketching games for collecting data about how people make and describe hand-made drawings.

Reading #23: InkSeine: In Situ Search for Active Note Taking

COMMENTS ON OTHERS:

         Jonathan  

SUMMARY

            InkSeine is a TabletPC application that offers rapid, minimally distracting interactions for users to seek, gather, and manipulate the “task detritus” of electronic work (links to documents, clippings from web pages, or key emails on a topic) across multiple notebook pages. Search offers a facile means to assemble such collages of notes, documents, and bitmaps while keeping the user engrossed in the inking experience as much as possible. InkSeine’s primary work surface is an electronic notebook that allows users to jot ink notes on a series of pages. Thus, all of the search facilities that are the focus of this paper are designed in support of the inking task itself.

InkSeine provides a quick way to collect and annotate content from multiple documents. While current features for gathering content provide ways to drag out information from the search panel, there are opportunities to further leverage the value of in situ search by allowing the user to pull in material for searches from the notes.  InkSeine’s in situ ink search strategy helps to reduce the cognitive barrier between having the thought to do a search while inking, to actually capturing that thought, and potentially acting on it at a later time. 

DISCUSSION

            InkSeine is a Tablet PC search application that allows users to store a pointer to a search via a breadcrumb object intermixed with their handwritten notes. Hinkley discovered that conventional GUI tooltips could be easily blocked by the hand. InkSeine presented a variation of a theme in which gestures were shown in situ as highlighter annotations over application widgets; the annotations could be toggled on and off with a button press. This technique was well suited toward disclosing simple gestures associated with explicit UI widgets. However, with only a few gestures, the technique cluttered the workspace but did not provide support for accessing more detailed information about subtle or complex gestures or for displaying gestures that required a document context (e.g., a selection lasso).

Reading #22: Plushie: An Interactive Design System for Plush Toys

COMMENTS ON OTHERS:

            Hong-Hoe (Ayden) Kim

SUMMARY

            Plushie, is an interactive system that allows nonprofessional users to design their own original plush toys. To design a plush toy, one needs to construct an appropriate two-dimensional (2D) pattern. However, it is difficult for non-professional users to appropriately design a 2D pattern. Plushie, allows the user to create a 3D plush toy model from scratch by simply drawing its desired silhouette. The user can also edit the model, such as by cutting the model and adding a part, using simple sketching interface. The resulting model is always associated with a 2D pattern and the 3D model is the result of a physical simulation that mimics the inflation effect caused by stuffing.

The user interactively draws free-form strokes on the canvas as gestures and the system performs corresponding operations. The system also provides some special editing operations tailored for plush toy design including create, cut, create parts, pull, insert and delete.  The authors used a standard triangle mesh for the representation of 3D model and the 2D patches with relatively coarse mesh (1000-2000 vertices) to achieve interactive performance. Each vertex, edge, and face of the 3D mesh is associated with corresponding entities in the 2D mesh. A 3D mesh is always given as a result of applying a physical simulation to the assembled 2D pattern. To be more precise, the physical simulation applied to the 3D mesh is governed by the rest length of each edge, which is defined in the 2D mesh geometry.  

DISCUSSION

            Creating 3D models is often a hard and laborious task due to the complexity and diversity of shapes involved, the intricate relationships between them, and the variety of surface representations. Current high-end modeling systems such as Maya, AutoCAD, and CATIA incorporate powerful tools for accurate and detailed geometric model construction and manipulation. These systems typically employ the WIMP (Window, Icon, Menu, Pointer) interface paradigm, which are based on selecting operations from menus and floating palettes, entering parameters in dialog boxes, and moving control points.

In this case, sketched input is used to define an initial state of a complex physical simulation or procedural model, domains that are typically encumbered with many parameters and initial settings to define. Mori and Igarashi provide an intriguing example of how SBIM techniques could be integrated with physical simulation: “if one can run an aerodynamic simulation during the interactive design of an airplane model, it might be helpful to intelligently adjust the entire geometry in response to the user’s simple deformation operations so that it can actually fly.” Exploring the output space of a procedural or physical model can be much more natural and efficient with free-form gestures, a notion that needs to be explored more fully in the future.

Reading #21: Teddy: A Sketching Interface for 3D Freeform Design

COMMENTS ON OTHERS:

            Francisco

SUMMARY

            The paper proposes the “Teddy”, an idea of creating a sketching interface for designing 3D freeform objects. This allows the user to draw 2D freeform strokes interactively specifying the silhouette of an object and the system is capable of automatically constructing the 3D polygonal surface model based on the strokes.

The user does not have to manipulate control points or combine complicated editing operations. Using this technique, even first-time users can create simple, yet expressive 3D models within minutes. 

Teddy’s physical user interface is based upon traditional 2D input devices such as a standard mouse or tablet. As soon as the user finishes drawing the stroke, the system automatically constructs a corresponding 3D shape. The program supports operations of creating new object, painting and erasing on the surface, Extrusion, cutting, smoothing and transformation. In order to remove noise in the handwriting input stroke and to construct a regular polygonal mesh, every input stroke is re-sampled to form a smooth polyline with uniform edge length before further processing 

DISCUSSION

            The authors follow the general philosophy of keeping the user interface simple by inferring the intention of a few, easy-to-learn commands, rather than providing an exhaustive set of commands and asking the user to set several parameters for each one. However, this is done by limiting the complexity and types of shapes that can be created by the user. Furthermore, the proposed system is not provided with a resultant accuracy measures and comparison parameters.

Wednesday, December 08, 2010

Reading #20: MathPad2: A System for the Creation and Exploration of Mathematical Sketches

COMMENTS ON OTHERS:

            Francisco 

SUMMARY

              This paper presents MathPad2, a prototype application for creating mathematical sketches.  MathPad2 incorporates a novel gestural interaction paradigm that lets users modelessly create handwritten mathematical expressions using familiar math notation and free-form diagrams, including associations between then two, with only a stylus.

According to authors, Mathematical sketching is the process of making simple illustrations from a combination of handwritten 2D mathematical expressions and sketched diagrams. Combining mathematical expressions with diagram elements, called an association, is done either implicitly using diagram labels as input to an inferencing engine or manually using a simple gestural user interface. 

DISCUSSION

            It is hard to understand the novelty of the work other than using a gesture recognition system applied towards identifying mathematical expressions. Most of the gestures used for the work are of hackneyed nature, and the use of lasso capability to group stands as something original. In my personal opinion, the work is more of a functionality of a regular character recognition system and gestures are just used to trigger an execution of a function processing, or combining, or editing/deleting.

Reading #19: Diagram Structure Recognition by Bayesian Conditional Random Fields

COMMENTS ON OTHERS:

            Jonathan  

SUMMARY

              The paper proposes recognition of hand-drawn diagram structures using Bayesian conditional random fields (BCRFs). BCRFs are a generalization of Bayes point machines or discriminative Bayesian classifiers for single elements. The method jointly analyzes all drawing elements in order to incorporate contextual cues. According to authors, the advantage of BCRFs over conventionally-trained CRFs include, model averaging and automatic hyper parameter tuning.

            The application of BCRFs to ink classification is to discriminate between the containers and connectors in a drawing of organization charts. This particular task includes, subdivision of pen strokes into fragments, construction of conditional random field on fragments, BCRF training and inference on network. In the first step, the strokes are divided into simpler components called fragments. Fragments should be small enough to belong to a single container or connector. In contrast, strokes occasionally span more than one part when drawn without lifting a pen. The fragments are selected as strokes that form a straight line segments. In the process of conditional random field on the fragments, each fragment is represented by a node in the network. 

DISCUSSION

            The proposed work shows an interesting method to segment ink features from a sketch application. The method is mathematically rigorous and therefore hard to understand by the content. Great deal of information of the feature extraction omitted from the paper, and therefore makes the works shine a bit less. Also note that the BCRFs accuracy is less compared to the joint performance of BCRFs-ARD.

Monday, November 15, 2010

Reading #28 iCanDraw

COMMENTS ON OTHERS:

           Francisco  

SUMMARY

              Proposed paper describes a feedback assisting system for users to draw a human face from an image. The work starts with a face recognition system to model the features of a human face from the image, and then sketch recognition to evaluate hand-drawn face and giving corrective feedback and receiving actions from the user. The teaching style of the proposed model is step-by-step instructions complemented by corrective feedback to assist a user towards creating an accurate rendition of an image. The user interface consists of a drawing area, a reference image to draw , and an area to provide instructions and options to the user.

            A reference image was manipulated for each step to help the user to see what to draw. This includes reference lines across horizontal and vertical image sections. The corrected feedback is provided with using text markers as well as visual guidelines. 

DISCUSSION

            I’m just wondering how this particular method can help on our project3 work where a corrective feedback system to draw a human eye. What other ways the current implementation can modify from its template base to some other way to compare the users sketch? I’m thinking may be Paleo is some sort of a help here? It can give you the type of the sketch the user is drawing and based on this we can decide upon a particular sketch from the required sketch is completed or not!

Reading #18: Spatial Recognition and Grouping of Text and Graphics

COMMENTS ON OTHERS:

            Jonathan 

SUMMARY

              The paper proposes a spatial framework for simultaneous grouping and recognition of shapes and symbols in free-form ink diagrams. The recognition is done by linking each stroke into a proximity graph and then using a discriminative classifier to classify connected subgraphs as either known symbol or invalid combination of strokes. In the preprocessing the graph is created with the nodes corresponds to strokes and edges to the strokes in close proximity. Then a dynamic programming approach was used to iterate on nodes and discriminated recognition was applied on the set. The classifier is named as AdaBoost and features are the Viola-Jones image filters to evaluate each stroke group. Authors also note that it is possible to input strokes which may not make up a shape at all called garbage shapes. 

DISCUSSION

            The paper is extremely short and not enough information provided with the recognizer, and in this case the most important part of this work. Much of the discussions in first few chapters are about the previous work, the preprocessing and the search tree on the neighborhood graph making process. The authors mentioned a use of dynamic programming in the content somewhere but not sure A* is the best method to get the optimum results. In my opinion the sketches tend to represent more of a close loop connection of strokes from start to end and this makes it easy to find an optimum path using a forward backward algorithm or using Viterbi.

Sunday, November 14, 2010

Reading #17: Distinguishing Text from Graphics in On-line Handwritten Ink

COMMENTS ON OTHERS:

             Wenzhe Li  

SUMMARY

              The authors states that the proposed study utilizes both characteristics of sketch strokes and also information provided by the gaps between the strokes, and temporal characteristics of the stroke sequence. The proposed approach is based on the use of discriminative machine learning techniques to infer the class of the stroke based on the observed ink.

            The proposed model starts with considering the isolated strokes and extracting features then training probabilistic classifier based on feed forward neural network. After that, the model is augmented with temporal information to capture the correlation between class labels. Finally it is proposed that to consider information extracted from the gaps between successive strokes. The model said to be based on 9 features extracted at the independent stroke model. 

DISCUSSION

            The proposed work uses a HMM model to enhance the performance by taking the context for each stroke into account. It is stated that the focus is on the use of temporal context since it leads to a 1D inference problem which can solve efficiently using dynamic programming techniques. The model uses Viterbi algorithm to find the states with the training data to get the most probably selection of the strokes. It is also obvious that the HMM require initial probabilistic information for the HMM model, and it is not clear from the context that how the initial probably set is chosen to train the system. Whether authors used a dummy set of probabilities or heuristic probably was used not really clear.

Reading #16: An Efficient Graph-Based Symbol Recognizer

COMMENTS ON OTHERS:

            JJ

SUMMARY

              The paper proposes a symbol recognizer for pen based user interfaces which the symbols are attributed to relational graphs and best matches of the unknown symbol based on that. The graph matching is done using four techniques, stochastic matching, error driven matching, greedy matching and sort matching.  In an attributed relational graph (ARG) each node represents the geometric primitives, and the edges represent the geometric relationship between them. A definition for a symbol is created by constructing an average ARG from a set of training samples. The proposed recognition applies the ARG of unknown symbol to compare it with ARG of each definition symbol to find a best match based on above defined graph matching techniques. 

DISCUSSION

             The authors claim that stochastic, error-driving matching, and greedy are search based classifiers and Sort is based on orientation fixed method. This means that the other 3 methods are orientation invariant? But the figure 3 actually presents a problem with searching where the orientation brings a dissimilar pair into the matching at the comparison.

Reading #15: An Image-Based, Trainable Symbol Recognizer for Hand-drawn Sketches

COMMENTS ON OTHERS:

           JJ  

SUMMARY

              The paper proposes a trainable, hand-drawn symbol recognizer based on multi layer recognition scheme with the symbols internal representation on a binary template. Ensembles of four different classifiers are used o rank symbols based on similarity to an unknown symbol and the scores are aggregated to produce a combined score. The best score is assigned to the unknown symbol. All four classifiers are template matching techniques to compute the similarity between symbols. The authors used a polar coordinate based technique to compensate the rotation sensitiveness of the template matching technique. The authors state that the proposed system is particularly useful for sketchy inputs like heavy over stroking and erasing due to its binary template approach. 

DISCUSSION

             The authors state that the binary template approach is useful in sketchy overly stroked and erased sketches, but the down sampling and framing it to a 48x48 is questionable whether the best approach. May be authors should experiment more with other techniques like binarizing the ink features and then getting a skeleton of the bits which will actually preserve the sketch input than reducing it.

Saturday, November 13, 2010

Reading #14. Using Entropy to Distinguish Shape Versus Text in Hand-Drawn Diagrams

COMMENTS ON OTHERS:

            Jianjie (JJ) Zhang. 

SUMMARY

            The paper proposes a method to distinguish between shape and text strokes based on entropy rate. The authors state that the entropy rate is significantly higher for text strokes compared to shape strokes. They propose a single feature-zero-order entropy rate with a correct classification rate of 92.06% with a trained threshold. The zero-order entropy is determined independently from the previous symbol of the sketch point. The symbol refers to an associated symbol value for each point’s angle between two points of a text or a shape. 

DISCUSSION

             The authors argued that using an arbitrary set of features and trial and error approach is time consuming, which I totally agreed. I had the same in mind when reading the Ink feature based recognition on statistical analysis. Time is something to worry in a sketch system with both text and shapes, not only to distinguish them but also to recognize them from the library sketches.  The proposed system deems to be a simpler way to achieve the text and shape classification, but its still not the best solution when it comes to complex shapes, which quite similar in entropy even to a simple character set.

Reading #13. Ink Features for Diagram Recognition

COMMENTS ON OTHERS:

            Francisco Vides 

SUMMARY

             The paper proposes to use formal statistical analysis methods to identify key ink features to improve recognition. The features measure aspects of an ink stroke’s curvature, size, time, intersections and use similar aspects to detect relationships between strokes. The proposed approach begins with investigating a range of possible ink features, how to collect these feature data and analysis, and initial result of an evaluation of a text/shape divider based on these key ink feature set. The proposed feature set includes 46 features and grouped into 7 categories of size, time, intersection, curvature, pressure, operating system recognition values and inter-stroke gaps. 

DISCUSSION

            Am I missing it or the 46 feature set is not actually listed on the paper or what? Also I’m wondering this particular technique is just to divide the whole sketch system into 2 groups, text or shape, and not to identify further each individual component against a library component? The authors state that they concentrated on identifying the distinguishing features of text versus shape strokes using a formal method for optimal ink feature selection; this means only identifying those stated 2 groups? I kow this is something important when we have a system like COA where possibly a text and shapes mixed with each other, but how feasible approach this is when recognizing something within a time limit say like sixty seconds? We have a recognition engine to separate text and shape and then again apply a recognition engine for each shape and text separately?

Friday, October 15, 2010

Reading #12. Constellation Models for Sketch Recognition

Comments on Others:

Marty

Summary

            This paper describes a constellation or pictorial structure model to recognize strokes in sketches by capturing the structure of a particular class of objects and based on the local features of it. Learning of the system is done by probabilistic model from example sketches with known stroke labeling. Then the recognition algorithm determines a maximum likelihood labeling for an unlabelled sketch by searching through the space of possible label assignments using multi pass branch and bound algorithm. In the model, object representation is done by a constellation model, which based on the features of pairs of parts. Due to the complexity of n number of features, each of these features is broken into mandatory and optional.

            According to the authors, the sketch recognition process has two phases, first to search possible mandatory labels and then optional labels. The searching is based on ML search procedure and search over possible label assignments are done by branch-bound search tree.  Authors proposed a multiple threshold and hard constraints to avoid situations with higher mandatory labels or higher number of strokes.

Discussion

             The idea of using a constellation model for the features of a sketch is an interesting one. I’m just wondering how useful this for our second project to identify course of action diagrams????

            In my opinion restricting a user to draw a sketch based on the assumption that mandatory and optional features existence is sort of a constraining users freehand sketching. Yes, this makes the life much easier for the authors by making the searching is less complex, but I guess they are forgetting the golden rule of sketching recognition……….”freehand sketching”……….. 

Find the paper here.   

Thursday, October 14, 2010

Reading #11. LADDER, a sketching language for user interface developers

Comments on Others:

 Hong-Hoe (Ayden) Kim 

Summary

             LADDER is a language to describe how sketched diagrams in a domain are drawn, displayed and edited. According to authors, LADDER structural descriptions can be automatically transform into domain specific shape recognizers, editing recognizers and shape exhibitors to use for sketch recognition domain. Furthermore, authors state that the LADDER is the first sketch description language as well as first implemented prototype system to prove that such a framework is possible to automatically generate a sketch interface for a domain from only domain description. Entire LADDER system consists of 3 sub systems, Domain Description, Translation and Sketch Recognition System. Authors also state that a language is a combination of predefined shapes, constraints, editing behaviors, display methods and domain description syntax. Still the domain descriptions are easy to specify and have enough details for accurate sketch recognition.

            LADDER is defined upon a set of shapes due to description limitations, things like abstract shapes, overly complicated irregular shapes, shapes not based on LADDER primitives, and domains with few curves or less details of curves. The LADDER system is based on vectors to define variable number of components. 

Discussion

            I’m not sure why there is a restriction to the shapes based on other than primitives defined on the LADDER? May be I’m not getting the concept right, are there any other primitives than a point, line and arc (in my opinion)? We should be able to define any other shape by simply using only 3 of these primitives, and I vaguely remember using only these 3 to do graphics programming during my undergraduate years. Correct me if I’m wrong…….      

Find the paper here.   

Reading #10. Graphical Input Through Machine Recognition of Sketches

Comments on Others:

Danielle

Summary

             This paper proposed an interactive system for graphical input in which the user openly participate in training the machine and massaging the data at all levels of interpretation. This includes 3 experiments, HUNCH: which tries to answer the question of existence of a syntax for sketching, architectural knowledge in recognizing a sketch and user involvement in the input process.

            HUNCH is a set of FORTRAN programs with different level of interpretation to process freehand sketches drawn with a data tablet or a light pen. The HUNCH used a program called a STRAIT as a corner finder, which actually based on the function of speed of the pen. Another program called CURVIT was used to finder the curves instead of corners. Even with the sketch recognition capabilities of the CURVIT and STRAIT, authors found that there is a difference when different people using the system which based on the human sketching behavior. Due to the latching problems the STRAIT was re-written without it called STRAIN. Overtracing was another interpretation problem that authors faced during the HUNCH development. The newest approach was using a much more interactive system to get user involvement to make decisions on machine selection which consisted of a database to and programs to manipulate data. The program is based on inference making procedures which is different from HUNCH and STRAIN and LATCH.

Discussion

              The paper is about a GUI to input and recognizer freehand sketches, but overall picture of the system is too hazy to understand. Author starts with a legacy system they used before the new system and the difficulties with the old system and how they overcome those in the new system. Without some performance measures and an overall picture of the system, it is hard to imagine how this new system works towards sketch recognition. I’m lost somewhere in the middle of the paper……………..!!! 

Find the paper here.

Wednesday, October 13, 2010

Reading #9. PaleoSketch: Accurate Primitive Sketch Recognition and Beautification

Comments on Others:

Francisco

Summary
           The paper proposes a new low-level recognition and beautification system to recognize 8 primitive shapes as well as combinations of these primitives with recognition rate at 98.56%. The Paleo process initiates with a pre-recognition calculation and after that sending these to lower-level shape recognizes for further processing. Each of these low-level recognizer correspond to a particular primitive shape then returns a Boolean flag to specify whether the recognizer passed or failed as well as a beautified shape object that best fits the input stroke. After all shapes are executed, are hierarchy function sorts each interpretation of the order of a best fit. The pre-recogntion eliminates duplicate points and then create series of graphs including directional graph, speed graph, curvature graph and corners based on a simple corner finder algorithm. In addition this phase computes 2 new features called NDDE and DCR to defer polylines from curves.  

Discussion 
             After using the Paleo in our first class project, I’m much satisfied with its processing capabilities w.r.t sketch recognition. It’s a powerful corner finder tool to get the basic primitive shapes of a sketch that you can apply towards devising other sketch recognition algorithms, (which makes life much easier). With my personal experience on hand writing recognition, I feel Paleo is a successful way of doing the preprocessing of an image for other low level tasks. I used to apply image processing techniques like binarization and skelatanizaton and then fuzzy rules to get primitive shapes and, yes, things are daunting that way.
           I’m just wondering whether Paleo is capable of giving few more of feature properties of interests??? Things like positive/negative slanted line, horizontal/vertical line, U-like, inverse U-like, V-like and inverse V like??? I’m not sure paleo is already doing this or not, but if so, things are pretty good for a new character recognition pre-processing task using PaleoSketch………..

Find the paper here

Monday, September 20, 2010

Sokushinbutsu (Self-Mummification) - Buddhism, 1/1000 ways to die………….

I thought this self immolation video is too emotional, (it was done to protest and bring attention to the world the prosecution of Buddhists), how WRONG AM I?????

Sokushinbutsu were Buddhist monks or priests who caused their own deaths in a way that resulted in their mummification. This practice reportedly took place almost exclusively in northern Japan around the Yamagata Prefecture. It is believed that many hundreds of monks tried, but only between 16 and 24 such mummifications have been discovered to date. The practice is not advocated or practiced today by any Buddhist sect. The practice was thought to be extinct in modern Japan, but a recent example was discovered in Tokyo in July 2010.


For 1,000 days (a little less than three years) the priests would eat a special diet consisting only of nuts and seeds, while taking part in a regimen of rigorous physical activity that stripped them of their body fat. They then ate only bark and roots for another thousand days and began drinking a poisonous tea made from the sap of the Urushi tree, normally used to lacquer bowls.

This caused vomiting and a rapid loss of bodily fluids, and most importantly, it made the body too poisonous to be eaten by maggots. Finally, a self-mummifying monk would lock himself in a stone tomb barely larger than his body, where he would not move from the lotus position. His only connection to the outside world was an air tube and a bell. Each day he rang a bell to let those outside know that he was still alive.

When the bell stopped ringing, the tube was removed and the tomb sealed. After the tomb was sealed, the other monks in the temple would wait another 1,000 days, and open the tomb to see if the mummification was successful. 

If the monk had been successfully mummified, they were immediately seen as a Buddha and put in the temple for viewing. Usually, though, there was just a decomposed body. Although they weren't viewed as a true Buddha if they weren't mummified, they were still admired and revered for their dedication and spirit.






Saturday, September 11, 2010

Reading #8. A Lightweight Multistroke Recognizer for User Interface Prototypes

Comments on Others:

Youyou Wang 

Summary
           The paper describes $N recognizer, a lightweight, concise multi-stroke recognizer, which is a significant extension to the $1 uni-stroke recognizer.  The $N said to be capable of user defined complex gesture identification, customization, and operate at speeds supporting fluid interaction (not sure what it means though???). $N is an extension of $1 to overcome limitations of that version, such as uni-stroke nature, failure to recognize 1D gestures, and rotation invariance. More specifically, the $N recognizer is implemented to use with rapid prototyping as way of quickly incorporating it to user interfaces with small loc size.
           The intuitive idea behind the $N development is to support rapid prototyping applications by eliminating the need of permutations of multistroke gesture and enable to enter only a single version of the gesture and use it for recognition. This is done by creating unistroke permutations of the multistroke at the define time and then using those for comparison at the run time. The $N uses automatic differentiating using a threshold to identify 1D and 2D gestures so that 1D’s can be preserved its aspect ratio.
           The paper also describes $N limitations, such as provisions for scale and position, and not using gesture features. 

Discussion
            First the use of name $N kind a amusing, may be they took the idea of $1 literally and replace 1 by N to represent the idea of multi-stroke possibility in their algorithm recognition compared to $1 recognizers uni-stroke nature. But I guess $1 recognizer authors used that name to represent both how simple it is, less expensive (shorter implementation time and small size) and uni-stroked. But when it comes to $N, the idea comes to readers mind is this thing N times expensive to implement (that much complex) than $1 recognizer (don’t laugh, that’s how I feel).
           One of the best goal of the $N as I consider is the ability to employ recognition with minimal input support (just a single multistroke entry) compared to other available recognizers including $1.  The use of just the Euclidian distance for comparison between the candidate stroke to permutations unistroke is questionable, and in my opinion not the correct technique to do so. May be a RMSE value between the candidate and the unistroke permutation may be a good idea to verify the accuracy and may be to increase the performance.
           The $N requires separate step to do the recognition of 1D from 2D, and in my opinion this can be avoided by using size invariant algorithmic design, may be a comparison based on the basic segment structure (example: a square consists of similar lengthen 2 horizontal and 2 vertical lines, and a triangle consists of 2 slanted and 1 horizontal lines).

Find the paper here.