Intelligent User Interfaces Laboratory
We are interested in enabling natural human-computer interaction by combining techniques from machine learning, computer vision, computer graphics, human-computer interaction and psychology. Specific areas that we focus on include: multimodal human-computer interfaces, affective computing, pen-based interfaces, sketch-based applications, intelligent user interfaces, applications of computer vision and machine learning to solving real world problems. Browse through the publications and research pages to get a flavor of IUI@Koc.
Publications
2023
120) Soykan G., Yuret D., T. M. Sezgin, “Identity-Aware Semi-Supervised Learning for Comic Character Re-Identification,” arXiv preprint arXiv:2308.09096 ,2023.
119) Sabuncuoglu A., T. M. Sezgin ,“Developing a Multimodal Classroom Engagement Analysis Dashboard for Higher-Education,” Proceedings of the ACM on Human-Computer Interaction, 2023.
118) T. M. Sezgin ,“Online Interpretation of Sketched Drawings,” Interactive Sketch-based Interfaces and Modelling for Design ,2023.
117) Buyukyazi T. ,Korkmaz M. ,T. M. Sezgin ,“HAISTA-NET: Human Assisted Instance Segmentation Through Attention,” arXiv preprint arXiv:2305.03105 ,2023.
116) Sabuncuoğlu A. ,Besevli C. ,T. M. Sezgin ,“Towards Building Child-Centered Machine Learning Pipelines: Use Cases from K-12 and Higher-Education,” arXiv preprint arXiv:2304.09532 ,2023.
115) Sabuncuoğlu A. ,T. M. Sezgin ,“Multimodal Group Activity Dataset for Classroom Engagement Level Prediction,” arXiv preprint arXiv:2304.08901 ,2023.
114) Triantafyllopoulos A. ,Schuller B. W. ,İymen G. ,He X. ,Yang Z. ,Tzirakis P. ,Liu S. ,Mertes S. ,Andre´ E. , T. M. Sezgin et al., “An overview of affective speech synthesis and conversion in the deep learning era,” IEEE ,2023.
113) Kesim E. ,Numanoglu T. ,Bayramoglu O. ,Turker B. B. ,Hussain N. ,Yemez Y. ,Erzin E, T. M. Sezgin , “The eHRI database: a multimodal database of engagement in human–robot interactions,” Lang Resources & Evaluation ,2023.
2022
112) Soykan G., Yuret, D., and T. M. Sezgin, “A comprehensive gold standard and benchmark for comics text detection and recognition,” arXiv preprint arXiv:2212.14674 ,2022.
111) Akman A. ,Sahillioğlu Y. , T. M. Sezgin, “Deep generation of 3D articulated models and animations from 2D stick figures,” Computers & Graphics, Volume 109, Pages 65-74 ,2022.
110) Topal B. B., Yuret D., and T. M. Sezgin, “Domain-adaptive self-supervised pre-training for face & body detection in drawings,” arXiv preprint arXiv:2211.10641 ,2022.
109) Sabuncuoglu A., and T. M. Sezgin, “Exploring children’s use of self-made tangibles in programming.” Retrieved March,” arXiv preprint arXiv:2210.06258 ,2022.
108) N. Hussain, E. Erzin, Y. Yemez and T. M. Sezgin,“Training Socially Engaging Robots: Modeling Backchannel Behaviors with Batch Reinforcement Learning,” IEEE Transactions on Affective Computing ,2022.
107) Sabuncuoglu, A. , and T. M. Sezgin, “Kart-ON: An Extensible Paper Programming Strategy for Affordable Early Programming Education,” Proceedings of the ACM on Human-Computer Interaction 6.EICS ,2022.
106) Sabuncuoğlu A. , and T. M. Sezgin, “Prototyping Products using Web-based AI Tools: Designing a Tangible Programming Environment with Children,” 6th FabLearn Europe/MakeEd Conference 2022.
105) Çelik B. , Dede E. , and T. M. Sezgin, “A Criticism on Popular Sketch Datasets,” 30th Signal Processing and Communications Applications Conference (SIU). IEEE ,2022.
104) Sabuncuoğlu A. , and T. M. Sezgin, “A Critical Evaluation of Recent Deep Generative Sketch Models from a Human-Centered Perspective,” 30th Signal Processing and Communications Applications Conference (SIU). IEEE ,2022.
103) Yanık E. , and T. M. Sezgin, “Active Sketch Scene Learning,” Available at SSRN 4084576 ,2022.”
2021
102) Ö. Z. Bayramoğlu, E. Erzin, T. M. Sezgin, and Y. Yemez, “Engagement Rewarded Actor-Critic with Conservative Q-Learning for Speech-Driven Laughter Backchannel Generation,” In Proceedings of the 2021 International Conference on Multimodal Interaction (ICMI ’21) ,2021.
101) A. Sabuncuoğlu, A. E. Yantaç, T. M. Sezgin, “Teaching K-12 Classrooms Data Programming: A Three-Week Workshop with Online and Unplugged Activities,” arXiv preprint arXiv:2110.05303 ,2021.
100) A. Zindancıoğlu and T. M. Sezgin, “Perceptually Validated Precise Local Editing for Facial Action Units with StyleGAN,” arXiv preprint arXiv:2107.12143 ,2021.
99) A. Sabuncuoğlu and T. M. Sezgin, “Devoloping Affordable Tangible Programming Education Applications Using Mobile Vision,” 2021 29th Signal Processing and Communications Applications Conference (SIU), 2021, pp. 1-4 ,2021.
98) K. T. Yesilbek, T. M. Sezgin, “Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops,” pp. 2142-2149 ,2021.
97) Recep Sinan Tumen, T. M. Sezgin, “Segmentation and Recognition of Offline Sketch Scenes Using Dynamic Programming” in IEEE Computer Graphics and Applications ,2021.
96) Yesilbek, Kemal Tugrul, and T. Metin Sezgin. “Sketch recognition with few examples (vol 69, pg 80, 2017).” COMPUTERS & GRAPHICS-UK 94 (2021): 191-191 ,2021.
2020
95) Z. Bucinca, Y. Yemez, E. Erzin and T. M. Sezgin, “AffectON:Incorporating Affect Into Dialog Generation” in IEEE Transactions on Affective Computing ,2020.
94) A. Sabuncuoğlu, T. M. Sezgin, “Kart-ON: Affordable Early Programming Education with Shared Smartphones and Easy-to-Find Materials, ” Proceedings of the 25th International Conference on Intelligent User Interfaces Companion ,2020.
93) A. Akman, Y. Sahillioğlu, T. M. Sezgin, ” Generation of 3D Human Models and Animations Using Simple Sketches, ” Graphics Interface ,2020.
92) Sadia, S. E. Emgin, T. M. Sezgin and Ç. Başdoğan, “Data-Driven Vibrotactile Rendering of Digital Buttons on Touchscreens,” International Journal of Human-Computer Studies ,2020.
91) Kurmanbek Kaiyrbekov, T. M. Sezgin, “Deep Stroke-Based Sketched Symbol Reconstruction and Segmentation,” IEEE Computer Graphics and Applications ,2020.
2019
90) Biswas P, Orero P, T. M. Sezgin . “Special Issue on Intelligent Interaction Design,” Artificial Intelligence for Engineering Design, Analysis and Manufacturing.
89) S. E. Emgin, A. Aghakhani, T. M. Sezgin and C. Basdogan, “HapTable: An Interactive Tabletop Providing Online Haptic Feedback for Touch Gestures,” in IEEE Transactions on Visualization and Computer Graphics, vol. 25, no. 9, pp. 2749-2762.
88) Alexandra Bonnici, Alican Akman, Gabriel Calleja, Kenneth P. Camilleri, Patrick Fehling, Alfredo Ferreira, Florian Hermuth, Johann Habakuk Israel, Tom Landwehr, Juncheng Liu, Natasha M. J. Padfield, T. M. Sezgin and Paul L. Rosin, “ Sketch-based interaction and modeling: where do we stand?“, Artificial Intelligence for Engineering Design, Analysis and Manufacturing .
87) K. Kaiyrbekov, T. M. Sezgin, “Stroke-based Sketched Symbol Reconstruction and Segmentation,” IEEE Computer Graphics and Applications.
86) N. Hussain, E. Erzin, T. M. Sezgin, Y. Yemez, “Speech Driven Backchannel Generation using Deep Q-Network for Enhancing Engagement in Human-Robot Interaction,” INTERSPEECH: Annual Conference of the International Speech Communication Association, Graz, Austria.
85) N. Hussain, E. Erzin, T. M. Sezgin, Y. Yemez, “Batch Recurrent Q-Learning for Backchannel Generation Towards Engaging Agents,” 8th International Conference on Affective Computing and Intelligent Interaction (ACII).
84) N. Alyuz, T. M. Sezgin, “Interpretable Machine Learning for Generating Semantically Meaningful Formative Feedback,” CVPR, IEEE Conference on Computer Vision and Pattern Recognition, Workshop on Explainable AI, Long Beach, CA.
83) Erelcan Yanik, T. M. Sezgin, “Active Scene Learning,” arXiv preprint arXiv:1903.02832.
2018
82) T. M. Sezgin, Ozem Kalay, “Sketch misrecognition correction system based on eye gaze monitoring,” US.
81) Doğancan Kebüde, Cem Eteke, T. M. Sezgin, Barış Akgün, “Communicative Cues for Reach-to-Grasp Motions: From Humans to Robots,“ Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, July 10-15, Stockholm, Sweden.
80) B Berker Türker, T. M. Sezgin, Yücel Yemez, Engin Erzin,“Multimodal prediction of head nods in dyadic conversations,“ 2018 26th Signal Processing and Communications Applications Conference (SIU).
79) Stéphane Dupont, Ozan Can Altiok, Aysegül Bumin, Ceren Dikmen, Ivan Giangreco, Silvan Heller, Emre Külah, Gueorgui Pironkov, Luca Rossetto, Yusuf Sahillioglu, Heiko Schuldt, Omar Seddati, Yusuf Setinkaya, T. M. Sezgin, Claudiu Tanase, Emre Toyan, Sean Wood, Doguhan Yeke,“VideoSketcher: Innovative Query Modes for Searching Videos through Sketches, Motion and Sound“, University of Mons.
78) Erik Marchi, Bjorn Schuller, Alice Baird, Simon Baron-Cohen, Amandine Lassalle, Helen O’Rielly, Delia Pigat, Peter Robinson, Ian Davies, Tadas Baltrusaitis, Ofer Golan, Shimrit Fridenson-Hayo, Shahar Tal, Shai Newman, Noga Meir-Goren, Antonio Camurri, Stefano Piana, Sven Bolte, T. M. Sezgin, Nese Alyuz, Agnieszka Rynkiewicz, Aurelie Baranger, “The ASC-Inclusion Perceptual Serious Gaming Platform for Autistic Children,” IEEE Transactions on Games
77) B. Türker, E. Erzin, Y. Yemez, and T. M. Sezgin, “Audio-Visual Prediction of Head-Nod and Turn-Taking Events in Dyadic Interactions,” Proc. Interspeech 2018, Hyderabad, India.
76) L. Devillers and S. Rosset and G. Dubuisson Duplessis and L. Bechade and Y. Yemez and B. B. Turker and T. M. Sezgin and E. Erzin and K. El Haddad and S. Dupont and P. Deleglise and Y. Esteve and C. Lailler and E. Gilmartin and N. Campbell, “Multifaceted Engagement in Social Interaction with a Machine: The JOKER Project,” 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 2018, pp. 697-701
75) Ç. Çığ and T. M. Sezgin, “Gaze-based predictive user interfaces: Visualizing user intentions in the presence of uncertainty,” International Journal of Human-Computer Studies, Vol: 111, pp. 78-91.
2017
74) Alper, N. H. Riche, F. Chevalier, J. Boy and T. M. Sezgin, Visualization Literacy at Elementary School, Conference on Pen and Touch Technology in Education.
73) V. Rudakova, N. Lin, N. Trayan, T. M. Sezgin, J. Dorsey and H. Rushmeier,” CHER-ish: A sketch- and image-based system for 3D representation and documentation of cultural heritage sites,” EUROGRAPHICS Workshop on Graphics and Cultural Heritage.
72) B. Türker, Y. Yemez, T. M. Sezgin, E. Erzin, “Audio-Facial Laughter Detection in Naturalistic Dyadic Conversations,” IEEE Transactions on Affective Computing.
71) K. T. Yeşilbek, T. M. Sezgin, “Sketch Recognition with Few Examples,” Computer & Graphics.
70) O. C. Altıok, T. M. Sezgin, “Characterizing User Behavior for Speech and Sketch-based Video Retrieval Interfaces,” Proceedings of Expressive 2017, Posters, Artworks, and Bridging Papers, the Eurographics Association, Los Angeles, CA, USA.
69) B. B. Türker, Z. Buçinca, E. Erzin, Y. Yemez, and T. M. Sezgin, “Analysis of Engagement and User Experience with a Laughter Responsive Social Robot,” Proc. Interspeech , 844-848.
68) W. Shi, Z. Wang, T. M. Sezgin, J Dorsey, H. Rushmeier, “Material Design in Augmented Reality with In-Situ Visual Feedback,” Proceedings of the Eurographics Symposium on Rendering, Forthcoming,Helsinki, Finland.
67) Y. Sahillioglu, T. M. Sezgin, “Sketch-based Articulated 3D Shape Retrieval,” IEEE Computer Graphics and Applications.
66) B. Alper, N. Riche, F. Chevalier, J. Boy, T. M. Sezgin, “Visualization Literacy at Elementary School,” In Proceedings of ACM CHI 2017, Conference on Human Factors in Computing Systems, Denver CO, May 6-11. Honorable Mention Award (top 5% of accepted publications).
2016
65) O. C. Altıok, K. T. Yesilbek, T. M. Sezgin, “What Auto Completion Tells Us About Sketch Recognition,” In Proceedings of Expressive 2016, Posters, Artworks, and Bridging Papers, the Eurographics Association, Lisbon, Portugal.
64) Ş. Çakmak, T. M. Sezgin, “Building a Gold Standard for Perceptual Sketch Similarity,” In Proceedings of Expressive 2016, Posters, Artworks, and Bridging Papers, the Eurographics Association, Lisbon, Portugal.
63) Ç. Çığ, T. M. Sezgin, “Gaze-Based Biometric Authentication: Hand-Eye Coordination Patterns as a Biometric Trait,” In Proceedings of Expressive 2016, Posters, Artworks, and Bridging Papers, the Eurographics Association, Lisbon, Portugal.
62) Tanase, I. Giangreco, L. Rossetto, H. Schuldt, O. Seddati, S. Dupont, O. C. Altıok, T. M. Sezgin, “Semantic Sketch-Based Video Retrieval with Autocompletion,” Proceedings of the 21st International Conference on Intelligent User Interfaces (IUI 2016), ACM, March 7–10, 2016, Sonoma, CA, USA.
61) L. Rossetto, I. Giangreco, S. Heller, C. Tanase, H. Schuldt, O. Seddati, S. Dupont. T. M. Sezgin, O. C. Altıok, Y. Sahillioglu, “IMOTION – Searching for Video Sequences using Multi-Shot Sketch Queries,” Proceedings of the 22nd International Conference on MultiMedia Modeling, Miami.
60) L. Rossetto, I. Giangreco, C. Tanase, H. Schuldt, O. Seddati, S. Dupont. T. M. Sezgin, Y. Sahillioglu, “iAutoMotion – an Autonomous Content-based Video Retrieval Engine,” Proceedings of the 22nd International Conference on MultiMedia Modeling, Miami.
2015
59) L. Devillers, S. Rosset, G. Dubuisson Duplessis, M. A. Sehili, L. Béchade, A. Delaborde, C. Gossart, V. Letard, F. Yang, Y. Yemez, B. B. Türker, T. M. Sezgin, K. El Haddad, S. Dupont, D. Luzzati, Y. Estève, E. Gilmartin, N. Campbell, “Multimodal Data Collection of Human-Robot Humorous Interactions in the JOKER Project,” Proceedings of the 6th International Conference on Affective Computing and Intelligent Interaction, Xi’an, China.
58) B. Schuller, E. Marchi, S. Baron-Cohen, A. Lassalle, H. O’Reilly, D. Pigat, P. Robinson, I. Davies, T. Baltrusaitis, M. Mahmoud, O. Golan, S. Fridenson, S. Tal, S. Newman, N. Meir, R. Shillo, A. Camurri, S. P., A. Staglianò, S. Bölte, D. Lundqvist, S. Berggren, A. Baranger, N. Sullings, T. M. Sezgin, N. Alyuz, A. Rynkiewicz, K. Ptaszek, K. Ligmann, “Recent developments and results of ASC-Inclusion: An Integrated Internet-Based Environment for Social Inclusion of Children with Autism Spectrum Conditions,” Proc. 3rd International Workshop on Digital Games for Empowerment and Inclusion (IDGEI 2015) held in conjunction with the 20th International Conference on Intelligent User Interfaces (IUI 2015), ACM, Atlanta, US.
57) Yanik and T. M. Sezgin, “Active Learning for Sketch Recognition. Computer & Graphics,” Accepted for publication.
56) Arasan, C. Basdogan, and T. M. Sezgin. “HaptiStylus: A Novel Stylus Capable of Displaying Movement and Rotational Torque Effects. IEEE Computer Graphics and Applications, Accepted for publication, (2015). Visit the link for visual demonstration
55) Caglar Tirkaz, Jacob Eisenstein, T. M. Sezgin and Berrin Yanikoglu, Identifying visual attributes for object recognition from text and taxonomy, Computer Vision and Image Understanding.
54) Madan, A. Kucukyilmaz, T. M. Sezgin, and C. Basdogan, “Recognition of Haptic Interaction Patterns in Dyadic Joint Object Manipulation,” IEEE Transactions on Haptics, Preprint .
53) L. Rossetto, I. Giangreco, H. Schuldt, S. Dupont, O. Seddati, T. M. Sezgin, Y. Sahillioglu, “IMOTION: A Content-based Video Retrieval Engine,” The 21st International Conference on MultiMedia Modeling, Accepted for publication.
52) Cig and T. M. Sezgin. “Real-time activity prediction: a gaze-based approach for early recognition of pen-based interaction tasks,” In Proceedings of the workshop on Sketch-based Interfaces and Modeling (SBIM ’15). Eurographics Association, Aire-la-Ville, Switzerland, Switzerland, 59-65.
51) K. T. Yesilbek, C. Sen, S. Cakmak and T. M. Sezgin,”SVM-based sketch recognition: which hyperparameter interval to try?,” In Proceedings of the workshop on Sketch-based Interfaces and Modeling (SBIM ’15). Eurographics Association, Aire-la-Ville,Switzerland, Switzerland, 117-121.
2014
50) C. Cig, T. M. Sezgin, “Gaze-Based Prediction of Pen-Based Virtual Interaction Tasks,” International Journal of Human-Computer Studies (TUBITAK A), in press, Accepted for publication on 20 September 2014.
49) C. Cig, T. M. Sezgin, “Gaze-Based Virtual Task Predictor,” In Proceedings of the 7th workshop on Eye gaze in intelligent human machine interaction: gaze in multimodal interaction (GazeIn ’14), ACM, (to be presented in Istanbul, Turkey on 16November 2014).
2013
48) Sezgin, T. M. Sezgin, “Finding the Best Portable Congruential Random Number Generators,“ Computer Physics Communications.
47) Arasan, C. Basdogan, T. M. Sezgin, “Haptic Stylus with Inertial and Vibro-Tactile Feedback,” Proceedings of World Haptics Conference.
46) R. S. Tumen, T. M. Sezgin, “DPFrag: A Trainable Stroke Fragmentation Framework based on Dynamic Programming.” IEEE Computer Graphics and Applications, Sept.-Oct. 2013, Vol. 33 no. 5), pp. 59-67.
45) Kucukyilmaz, T. M. Sezgin, C. Basdogan, “Intention Recognition for Dynamic Role Exchange in Haptic Collaboration,“ IEEE Transactions on Haptics, vol. 6, no. 1.
44) Cig, T. M. Sezgin, “New modalities, new challenges – Annotating sketching and gaze data,“ In Proceedings of the 21st IEEE Signal Processing and Communications Applications Conference (SIU’13), pp.1-4.
2012
36) Tirkaz, B. Yanikoglu, T. M. Sezgin,Sketched Symbol Recognition with Few Examples Using Particle Filtering, ACM Symposium on Sketch Based Interfaces and Modeling, Vancouver, Canada.
35) Kucukyilmaz, T. M. Sezgin, C. Basdogan, “Conveying intentions through haptics in human-computer collaboration,” IEEE World Haptics Conference 2011, Istanbul, Turkey.
34) R. Arandjelovic, T. M. Sezgin, “Sketch recognition by fusion of temporal and image-based features, Pattern Recognition,” vol: 44, issue: 6, pp 1225-1234.
2011
36) Tirkaz, B. Yanikoglu, T. M. Sezgin,Sketched Symbol Recognition with Few Examples Using Particle Filtering, ACM Symposium on Sketch Based Interfaces and Modeling, Vancouver, Canada.
35) Kucukyilmaz, T. M. Sezgin, C. Basdogan, “Conveying intentions through haptics in human-computer collaboration,” IEEE World Haptics Conference 2011, Istanbul, Turkey.
34) R. Arandjelovic, T. M. Sezgin, “Sketch recognition by fusion of temporal and image-based features, Pattern Recognition,” vol: 44, issue: 6, pp 1225-1234.
2010
33) S. Afzal, T. M. Sezgin, P. Robinson, “Decoding Emotions from Facial Animations,” ACM / SSPNET International Symposium on Facial Analysis and Animation, Edinburgh, UK.
32) Y. Gao, Q. Zhao, A. Hao, T. M. Sezgin, N. A. Dodgson, “Automatic construction of 3D animatable facial avatars,” Computer Animation and Virtual Worlds, Vol:21, Issue 3-4, pp (343-354), DOI: 10.1002/cav.340 .
31) R.Sinan Tumen, M.Emre Acer, T. M.Sezgin, “Feature Extraction and Classifier Combination for Image-based Sketch Recognition,” ACM Symposium on Sketch Based Interfaces and Modeling, Annecy, France.
30) S. O. Oguz, A. Kucukyilmaz, T. M. Sezgin, C. Basdogan, “Haptic Negotiation and Role Exchange for Collaboration in Virtual Environments.” Haptics Symposium Waltham, Massachusetts, USA.
29) Y. Gao, T. M. Sezgin, N. Dodgson, “Automatic construction of 3D animatable facial models.” International Conference on Computer Animation and Social Agents, Amsterdam, Netherlands.
2009
28) T. M. Sezgin, I. Davies, P. Robinson, “Multimodal inference for driver-vehicle interaction.” International Conference on Multimodal Interfaces, Cambridge, MA.
27) S. Afzal, T. M. Sezgin, Y. Gao, P. Robinson, “Perception of Emotional Expressions in Different Representations Using Facial Feature Points.” IEEE International Conference on Affective Computing and Intelligent Interaction, Amsterdam, Netherlands.
26) T. M. Sezgin, I. Davies, P. Robinson, “Multimodal inference for driver-vehicle interaction.” Workshop on Multimodal Interfaces for Automotive Applications, International Conference on Intelligent User Interfaces, Sanibel, FL.
2008
25) Blessing, T. M. Sezgin, R. Arandjelovic, P. Robinson, “A multimodal interface for road design,” Workshop on Sketch Recognition, International Conference on Intelligent User Interfaces, Sanibel, FL.
24) T. M. Sezgin and R. Davis, “Sketch Recognition in Interspersed Drawings Using Time-Based Graphical Models,” Computers & Graphics Journal.
23) M. Altinel, E. Arpali, T. M. Sezgin, F. Sezgin, F. Gonenc, A. Yazicioglu, A New Logistic Regression Based Nomogram Developed For Predicting Prostate Biopsy Outcomes in The Turkish Population. 20th Congress of the Turkish Urology Association, Antalya, Turkey.
22) P. Biswas, T. M. Sezgin and P. Robinson, “Perception Model for People with Visual Impairments.” Proceedings of the 10th International Conf. on Visual Information Systems (LNCS 5188), Salerno, Italy.
2007 - 2001
2007
21) T. M. Sezgin and P. Robinson, “Affective Video Data Collection Using an Automobile Simulator.” Second International Conference on Affective Computing and Intelligent Interaction, Lisbon, Portugal.
20) X Pan, M Gillies, T. M. Sezgin, C. Loscos, “Expressing Complex Mental States Through Facial Expressions.” Second International Conference on Affective Computing and Intelligent Interaction, Lisbon, Portugal.
19) T. M. Sezgin and R. Davis. “Temporal Sketch Recognition in Interspersed Drawings,”Fourth Eurographics Workshop on Sketch-Based Interfaces and Modeling, University of California, Riverside, CA.
18) T. M. Sezgin and R. Davis. “Sketch Interpretation Using Multiscale Models of Temporal Patterns,” IEEE Computer Graphics & Applications Journal, Volume: 27, Issue: 1, pp: 28-37.
17) Dibeklioglu, T. M. Sezgin, E. Ozcan. “A Recognizer for Free-Hand Graph Drawings,” International Workshop on Pen-Based Learning Technologies, Catania, Italy.
16) T. M. Sezgin, “Overview of Recent Work in Pen-Centric Computing,” Invited Workshop on Pen-Centric Computing, Providence RI.
15) T. M. Sezgin, “Sketch Interpretation Using Multiscale Models of Temporal Patterns,” In NESCAI’06 Northeast Student Colloquium on Artificial Intelligence., Ithaca NY.
2006
14) Sezgin and T. M. Sezgin, “On the Statistical Analysis of Feigenbaum Constants,” Journal of the Franklin Institute, vol. 343, pp. 756-758 .
13) T. M. Sezgin, T. Stahovich, and R. Davis, “Sketch Based Interfaces: Early Processing for Sketch Understanding,” August 2006 SIGGRAPH ’06 SIGGRAPH courses.
12) T. M. Sezgin, and R. Davis, “Scale-space based feature point detection for digital ink,” August 2006 SIGGRAPH ’06 SIGGRAPH courses.
11) T. M. Sezgin, T. Stahovich, and R. Davis, “Sketch Based Interfaces: Early Processing for Sketch Understanding,” ACM International Conf. Proc. Series; Vol. 15. Perceptive User Interfaces, Orlando FL.
2005
10) T. M. Sezgin and R. Davis, “HMM-Based Efficient Sketch Recognition,” In Proceedings of the International Conference on Intelligent User Interfaces (IUI’05), San Diego, CA.
9) T. M. Sezgin and R. Davis, “Modeling Online Sketching as a Dynamic Process,” In Proceedings of CSAIL Student Workshop ’05 Gloucester, MA.
2004
8) T. M. Sezgin and R. Davis, “Handling Overtraced Strokes in Hand-Drawn Sketches,” In Proceedings of the AAAI Spring Symposium Series: Making Pen-Based Interaction Intelligent and Natural, Washington DC.
7) T. M. Sezgin and R. Davis, “Scale-space Based Feature Point Detection for Digital Ink,” In Proceedings of the AAAI Spring Symposium Series: Making Pen-Based Interaction Intelligent and Natural, Washington DC.
2003
6) T. M. Sezgin, “Recognition efficiency issues for freehand sketche,” Proceedings of the MIT Student Oxygen Workshop. Gloucester, MA.
2002
5) Randall Davis, Aaron Adler, Christine Alvarado, Tracy Hammond, Rebecca Hitchcock, Michael Oltmans, T. M. Sezgin, Olya Veselova, “Designs for the Future,” MIT Artificial Intelligence Laboratory Annual Abstract.
4) Tracy Hammond, T. M. Sezgin, Olya Veselova, Aaron Adler, Michael Oltmans, Christine Alvarado, Rebecca Hitchcock, “Multi-domain sketch recognition,” Proceedings of the 2nd Annual MIT Student Oxygen Workshop.
3) T. M.Sezgin, Randall Davis, “Generating domain specific sketch recognizers from object descriptions,” Student Oxygen Workshop. July.
2) All Davis, Aaron Adler, Christine Alvarado, Tracy Hammond, Rebecca Hitchcock, Michael Oltmans, T. M. Sezgin, Olya Veselova, “Art and the Future,” MIT Artificial Intelligence Laboratory Annual Abstract.
2001
- 1) T. M. Sezgin, T. Stahovich, and R. Davis, “Sketch Based Interfaces: Early Processing for Sketch Understanding.” ACM International Conference Proc., Vol. 15. Perceptive User Interfaces, Orlando FL.
Theses
Salih Ozgur Oguz, A Negotiation Model for Affective Visuo-Haptic Communication Between a Human Operator and a Machine, M.S. Thesis. Department of Electrical and Computer Engineering, Koc University (2010).
Projects
Completed Projects
- Tangible Intelligent Interfaces for Teaching Computational Thinking Skills, Scientific & Technological Research Council of Turkey, High Priority Areas R&D Program, (Principal Investigator, Koç University), 2019 – 2021, 1003S
- Backchannel Feedback Modeling for Human-Computer Interaction ( E.Erzin, Y. Yemez, T.M. Sezgin). Funded by Scientific & Technological Research Council of Turkey, 2018-2020
- JOKER
European Commission ERA-NET Program, 2013-2017 The JOKER project will build and develop a generic intelligent user interface providing a multimodal dialogue system with social communication skills including humor, empathy, compassion, charm, and other informal socially-oriented behavior. - iMotion
European Commission ERA-NET Program, 2013-2017 The IMOTION project will develop and evaluate innovative multi-modal user interfaces for interacting with augmented videos. Starting with an extension of existing query paradigms (keyword search in manual annotations), image search (query by example in key frames), IMOTION will consider novel sketch- and speech-based user interfaces.
- ASC-Inclusion (Sponsored by the European Commission) 2011-2014.
The main goal of this project is to develop a computer software program that will assist children with Autism Spectrum Conditions (ASC) to understand and express emotions through facial expressions, tone-of-voice and body gestures.This software will assist them to understand and interact with other people, and as a result, will increase their inclusion in society. Academic partners include University of Cambridge, United Kingdom, Technische Universität München, Germany, Bar Ilan University, Israel, Koç University, Turkey, and Università degli Studi di Genova, Italy.
- Intelligent Interfaces for eLearning
Scientific & Technological Research Council of Turkey, 2013-2016.
The goal of this project is to build the pen-based interfaces for the classroom of the future, and it is funded under the National Priority Areas R&D Program of the Research Council of Turkey (TUBITAK). The scope of the project is not public at the moment. Contact Dr. Sezgin for details.
- Semi-supervised Intelligent Multimodal Content Translator for Smart TVs
SANTEZ Programme, Ministry of Science, Industry, and Technology, Turkey 2012-2014
TVs are slowly morphing into powerful set-top computers with internet connections. As such, they slowly assume role of take over roles and functions that were traditionally associated with desktop computers. TV users, for example, can use their TV for browsing the internet. Unfortunately, the vast majority of the content in the internet has been designed for desktop viewing, hence they have to be adapted for viewing on a TV. In this Project, we aim to develop a semi-automatic content retargeting system, which is expected to work with minimal intervention of an expert.
- Gesture-Based Interfaces
Koç Sistem R&D Programme, 2011-2013 - Pen-based Multimodal Intelligent User Interfaces
Career Grant, Scientific and Technological Research Council of Turkey, 2011-2014 - Educational Sketch-Based Intelligent Interfaces
Turk Telekom R&D Programme, 2010-2013 - Interactive Intelligent Sketching Board
KOLT Teaching Innovation Grant, 2010-2011 - Deep Green: Commander’s Associate
DARPA/BAE/SIFT (British Aerospace/Smart Information Flow Technologies), 2008-2009 - Deep Green: Commander’s Associate
DARPA/SAIC (Science Applications International Corporation), 2008-2009
Non-Sponsored Research Projects
- Early Processing for Sketch Recognition
Freehand sketching is a natural and crucial part of everyday human interaction, especially important in design, yet is unsupported by current design automation software. We are working to combine the flexibility and ease of use of paper and pencil with the processing power of a computer, to produce a design environment that feels as natural as paper, yet is considerably smarter. One of the most basic steps in accomplishing this is converting the original digitized pen strokes in the sketch into the intended geometric objects. We have implemented a system that combines multiple sources of knowledge to provide robust early processing for freehand sketching.
Selected Publications
- Tevfik Metin Sezgin. Feature Point Detection and Curve Approximation for Early Processing of Free-Hand Sketches.Master’s Thesis. May 2001. Department of EECS, MIT.
- Tevfik Metin Sezgin and Randall Davis. Handling Overtraced Strokes in Hand-Drawn Sketches. In Making Pen-Based Interaction Intelligent and Natural . 2004.
[ BibTeX ][ PDF ][ PS ] - Tevfik Metin Sezgin and Randall Davis. Scale-space Based Feature Point Detection for Digital Ink. In Making Pen-Based Interaction Intelligent and Natural . 2004.
[ BibTeX ][ PDF ][ PS ] - Tevfik Metin Sezgin, Thomas Stahovich, and Randall Davis. Sketch Based Interfaces: Early Processing for Sketch Understanding.Workshop on Perceptive User Interfaces, Orlando FL . 2001.
[ BibTeX ][ PDF ][ PS ] - Tevfik Metin Sezgin and Randall Davis. Early Sketch Processing with Application in HMM Based Sketch Recognition. In MIT Computer Science and Artificial Intelligence Laboratory Technical Report AIM-2004-016, July 2004.
[ PDF ][ PS ]
Sketch Recognition
A major portion of pen-centric research has revolved around the goal of enabling natural human-computer interaction. We believe progress in recognition techniques is critical to achieving the goal of natural sketch-based interfaces. We need to improve over the existing recognition algorithms in terms of efficiency and recognition accuracy. Our work in recognizing sketches using temporal patterns that naturally appear in online sketching contributes toward addressing these algorithmic issues.
Our analysis of real sketch examples from target user groups has revealed that individuals have personal sketching styles manifested in the form of patterns in temporal stroke orderings (i.e., people tend to use predictable stroke orderings during sketching). Based on this finding, we have developed two algorithms that use ensembles of Hidden Markov Models (HMMs) and Dynamic Bayesian Networks (DBNs) to learn temporal patterns in stroke orderings and perform efficient recognition.
Selected Publications
- Tevfik Metin Sezgin and Randall Davis.Temporal Sketch Recognition in Interspersed Drawings. Fourth Eurographics Workshop on Sketch-Based Interfaces and Modeling, University of California, Riverside, CA, August 2-3, (2007).
- Tevfik Metin Sezgin Overview of Recent Work in Pen-Centric Computing: Vision and Research Summary. In Invited Workshop on Pen-Centric Computing Research, Brown University, March 26-28 2007.
- Tevfik Metin Sezgin and Randall Davis. Sketch Interpretation Using Multiscale Models of Temporal Patterns. In IEEE Journal of Computer Graphics and Applications,Volume: 27, Issue: 1, pp: 28-37, 2007.
[ PDF ][ PS ] - Tevfik Metin Sezgin and Randall Davis. HMM-Based Efficient Sketch Recognition. In Proceedings of the International Conference on Intelligent User Interfaces(IUI’05). New York, New York, January 9-12 2005.
[ BibTeX ][ Extended PDF ][ Extended PS ][ PDF ][ PS ][ PPT ] - Tevfik Metin Sezgin and Randall Davis. Modeling Sketching as a Dynamic Process. In CSW ’05 Gloucester, MA . 2005.
[ BibTeX ][ PDF ][ PS ] - Tevfik Metin Sezgin and Randall Davis. Efficient search space exploration for sketch recognition. In MIT Computer Science and Artificial Intelligence Laboratory Annual Research Abstract. 2004.
[ BibTeX ][ PDF ][ PS ] - Tevfik Metin Sezgin and Randall Davis. Early Sketch Processing with Application in HMM Based Sketch Recognition. In MIT Computer Science and Artificial Intelligence Laboratory Technical Report AIM-2004-016, July 2004.
[ PDF ][ PS ] - Tevfik Metin Sezgin. Generic and HMM based approaches to freehand sketch recognition. Proceedings of the MIT Student Oxygen Workshop. 2003.
[ BibTeX ][ PDF ][ PS ] - Tevfik Metin Sezgin. Recognition efficiency issues for freehand sketches.Proceedings of the MIT Student Oxygen Workshop. 2003.
[ BibTeX ][ PDF ][ PS ] - Tracy Hammond, Metin Sezgin, Olya Veselova, Aaron Adler, Michael Oltmans, Christine Alvarado, and Rebecca Hitchcock. Multi-Domain Sketch Recognition.Proceedings of the 2nd Annual MIT Student Oxygen Workshop . 2002.
- Tevfik Metin Sezgin. Generating Domain Specific Sketch Recognizers From Object Descriptions.Proceedings of the MIT Student Oxygen Workshop. 2002.
[ BibTeX ][ PDF ][ PS ] - Christine Alvarado, Metin Sezgin, Dana Scott, Tracy Hammond, Zardosht Kasheff, Michael Oltmans, and Randall Davis. A Framework for Multi-Domain Sketch Recognition. In MIT Artificial Intelligence Laboratory Annual Abstract . September 2001.
[ BibTeX ][ PDF ][ PS ]
Readily Deployable Sketch-Based Applications
Part of our current research effort aims to construct and evaluate sketch-based applications for domains where recognition is robust enough to allow the deployment of these systems in real settings. Unlike my work in developing sketch recognition algorithms, in this line of research, the emphasis is on building systems that can readily be adopted by the intended audience and immediately integrated into their work flow. Therefore, the focus is on the construction and evaluation of pen-based interfaces for domains that are simple enough to yield reasonably high recognition rates with the current state of art in sketch recognition. Graphs and Course of Action Diagrams are two such domains.
Graph Manipulation
Along with collaborators, we developed and evaluated an application that allows computer science students to draw and interact with directed and undirected graphs using a pen-based interface. The recognition engine of this application used a variety of methods including Kohonen networks, iterative closest point and parallel sampling algorithms for recognizing user-drawn graphs and digits. Watch these clips to see this tool in action: [Clip1] [Clip 2]. You’ll need the Camtasiacodec to play the videos.
Course of Action Diagram Recognition
Course of action diagrams are drawings constructed by military commanders to depict military scenarios (e.g., locations and movements of friendly and enemy units). They are typically drawn by hand on layers of acetate overlaid on top of maps. We are currently working on systems that can recognize course of action diagrams as they are drawn. This is a three year long project and at the moment there is funding for two PhD students. We’re also looking for summer interns to work on related projects.
This project is in collaboration with Dr. Hammond from Texas A&M University and Dr. Alvarado from Harvey Mudd College, USA.
Selected Publications
- Blessing, T. M. Sezgin, R. Arandjelovic, P. Robinson, A multimodal interface for road design.Workshop on Sketch Recognition, International Conference on Intelligent User Interfaces, Sanibel, FL, February (2009).
- Hamdi Dibeklioglu, Tevfik Metin Sezgin and Ender Ozcan A Recognizer for Free-Hand Graph Drawings. In International Workshop on Pen-Based Learning Technologies, Catania, Italy, May 24-25, 2007.
[ Extended PDF ][ Extended PS ][ PDF ][ PS ] - Tevfik Metin Sezgin Overview of Recent Work in Pen-Centric Computing: Vision and Research Summary. In Invited Workshop on Pen-Centric Computing Research, Brown University, March 26-28 2007.
[ PDF ][ PS ]
Affective Computing and Applications
In collaboration with colleagues from University of Cambridge, we are exploring ways of animating avatars to display emotions as people do. Our primary interest is in applications of machine learning for inferring people’s affective state and affective animation of avatars.
Driver Monitoring and Intelligent Interfaces for Automobiles
Automatic recognition of drivers’ affective state has received interest as a potential source of information for in-car driver monitoring systems. Although there have been studies describing the use of relatively invasive physiological measurements and expensive eye tracking information, facial appearance data has not been explored as much. We have investigated ways of inferring physical and mental states of drivers from video data. We compiled a video corpus by recording drivers subjected to a set of controlled driving conditions in a driving simulator. We are currently exploring ways of automatically processing the video data to facilitate higher fidelity annotation and mental state recognition. We’re looking for MS and PhD students to work on related projects.
Publications
- Afzal,T. M. Sezgin, Y. Gao, P. Robinson, Perception of Emotional Expressions through Facial Feature Points.International Conference on Affective Computing and Intelligent Interaction, Amsterdam, Netherlands, September 10-11, (2009).
- Gao,T. M. Sezgin, N. Dodgson., Automatic construction of 3D animatable facial models.International Conference on Computer Animation and Social Agents, Amsterdam, Netherlands, June 17-19, (2009).
- M. Sezgin, I. Davies, P. Robinson, Multimodal inference for driver-vehicle interaction.Workshop on Multimodal Interfaces for Automotive Applications, International Conference on Intelligent User Interfaces, Sanibel, FL, February (2009).
Tevfik Metin Sezgin, Peter Robinson, Affective Video Data Collection Using an Automobile Simulator. Second International Conference on Affective Computing and Intelligent Interaction, Lisbon, Portugal, September 12-14, (2007).
Xueni Pan, Marco Gillies, Tevfik Metin Sezgin, Celine Loscos, Expressing Complex Mental States Through Facial Expressions. Second International Conference on Affective Computing and Intelligent Interaction, Lisbon, Portugal, September 12-14, (2007).
People
Prof. Dr. T. Metin Sezgin
Koc University, Associate Professor
Yale University, Visiting Fellow
Previous Affiliations
Harvard University, Visiting Professor
University of Cambridge, Postdoctoral Associate
Education
Ph.D., Massachusetts Institute of Technology
M.Sc., Massachusetts Institute of Technology
B.Sc., Syracuse University
Prof. Dr. Sezgin graduated summa cum laude with Honors from Syracuse University in 1999. He received his MS in 2001 and his PhD in 2006, both from Massachusetts Institute of Technology. He joined the University of Cambridge Computer Laboratory as a Postdoctoral Research Associate in 2006. He joined Koc University in 2009, and is currently an Associate Professor at the Department of Computer Engineering.
Dr. Sezgin leads the Intelligent User Interfaces Lab at Koç University. His main research goal is enabling people to interact with computers in a more natural fashion by combining techniques from machine learning, computer vision, computer graphics and human-computer interaction.
Email: mtsezgin@ku.edu.tr
Gallery
Videos
Links
Journals
Interactive Journal of Creative Interfaces and Computer Graphics
IEEE Transactions on Pattern Analysis and Machine Intelligence
ACM Transactions of Intelligent Systems (TiiS)
IEEE Transactions on Affective Computing
IEEE Computer Graphics & Applications
Journal on Multimodal Interfaces
Presentations
Reading Group
Research Groups
Summer Research Reports
Blog
A Comprehensive Gold Standard and Benchmark for Comics Text Detection and Recognition
This study focuses on improving the optical character recognition (OCR) data for panels in the COMICS dataset, the largest dataset containing text and images from comic books. To do this, we developed a pipeline for OCR processing and labeling of comic books and created the first text detection and recognition datasets for western comics, called “COMICS Text+: Detection” and “COMICS Text+: Recognition”. We evaluated the performance of state-of-the-art text detection and recognition models on these datasets and found significant improvement in word accuracy and normalized edit distance compared to the text in COMICS. We also created a new dataset called “COMICS Text+”, which contains the extracted text from the textboxes in the COMICS dataset. Using the improved text data of COMICS Text+ in the comics processing model from resulted in state-of-the-art performance on cloze-style tasks without changing the model architecture. The COMICS Text+ dataset can be a valuable resource for researchers working on tasks including text detection, recognition, and high-level processing of comics, such as narrative understanding, character relations, and story generation. All the data and inference instructions can be accessed in https://github.com/gsoykan/comics_text_plus.
Key words: Comics text dataset, OCR on comics, The Golden Age of, Comics, Text detection on comics, Text recognition on comics
Authors: Soykan, G., Yuret, D., and T. M. Sezgin
The eHRI database: a multimodal database of engagement in human–robot interactions
We present the engagement in human–robot interaction (eHRI) database containing natural interactions between two human participants and a robot under a story-shaping game scenario. The audio-visual recordings provided with the database are fully annotated at a 5-intensity scale for head nods and smiles, as well as with speech transcription and continuous engagement values. In addition, we present baseline results for the smile and head nod detection along with a real-time multimodal engagement monitoring system. We believe that the eHRI database will serve as a novel asset for research in affective human–robot interaction by providing raw data, annotations, and baseline results.
Key words: Engagement, Gesture, Multimodal Data, Human-Robot Interaction
Authors: Kesim E. ,Numanoglu T. ,Bayramoglu O. ,Turker B. B. ,Hussain N. ,Yemez Y. ,Erzin E, T. M. Sezgin
Haptic Negotiation and Role Exchange for Collaboration in Virtual Environments
We investigate how collaborative guidance can be realized in multimodal virtual environments for dynamic tasks involving motor control. Haptic guidance in our context can be defined as any form of
force/tactile feedback that the computer generates to help a user execute a task in a faster, more accurate, and subjectively more pleasing fashion. In particular, we are interested in determining guidance mechanisms that best facilitate task performance and arouse a natural sense of collaboration. We suggest that a haptic guidance system can be further improved if it is supplemented with a role exchange mechanism, which allows the computer to adjust the forces it applies to the user in response to his/her actions. Recent work on collaboration and role exchange presented new perspectives on defining roles and interaction. However existing approaches mainly focus on relatively basic environments where the state of the system can be defined with a few parameters. We designed and implemented a complex and highly dynamic multimodal game for testing our interaction model. Since the state space of our application is complex, role exchange needs to be implemented carefully. We defined a novel negotiation process, which facilitates dynamic communication between the user and the computer, and realizes the exchange of roles using a three-state finite state machine. Our preliminary results indicate that even though the negotiation and role exchange mechanism we adopted does not improve performance in every evaluation criteria, it introduces a more personal and humanlike interaction model.
Key words: Human Factors; Evaluation/Methodology; Haptic I/O; Haptic User Interfaces; Haptic Guidance; Dynamic Systems and Control; Multimodal Systems; Virtual Environment Modeling; Human-computer interaction; Collaboration
Authors: A. Kucukyilmaz, S. O. Oguz, T. M. Sezgin, C. Basdogan
The role of roles: Physical cooperation between humans and robots
Since the strict separation of working spaces of humans and robots has experienced a softening due to recent robotics research achievements, close interaction of humans and robots comes rapidly into reach. In this context, physical human– robot interaction raises a number of questions regarding a desired intuitive robot behavior. The continuous bilateral information and energy exchange requires an appropriate continuous robot feedback. Investigating a cooperative anipulation task, the desired behavior is a combination of an urge to fulfill the task, a smooth instant reactive behavior to human force inputs and an assignment of the task effort to the cooperating agents. In this paper, a formal analysis of human–robot cooperative load transport is presented. Three different possibilities for the assignment of task effort are proposed. Two proposed dynamic role exchange mechanisms adjust the robot’s urge to complete the task based on the human feedback. For comparison, a static role allocation strategy not relying on the human agreement feedback is investigated as well. All three role allocation mechanisms are evaluated in a user study that involves large-scale kinesthetic interaction and full-body human motion. Results show tradeoffs between subjective and objective performance measures stating a clear objective advantage of the proposed dynamic role allocation scheme.
Key words: cooperative manipulation, human feedback, input decomposition, load sharing, kinesthetic interaction
Authors: A. Mortl, M. Lawitzky, A. Kucukyilmaz, T. M. Sezgin, C. Basdogan, S. Hirche.
Analysis of Engagement and User Experience with a Laughter Responsive Social Robot
We explore the effect of laughter perception and response in terms of engagement in human-robot interaction. We designed two distinct experiments in which the robot has two modes: laughter responsive and laughter non-responsive. In responsive mode, the robot detects laughter using a multimodal real-time laughter detection module and invokes laughter as a backchannel to users accordingly. In non-responsive mode, robot has no utilization of detection, thus provides no feedback. In the experimental design, we use a straightforward question-answer based interaction scenario using a back-projected robot head. We evaluate the interactions with objective and subjective measurements of engagement and user experience.
Key words: laughter detection, human-computer interaction, laughter responsive, engagement.
Authors: Bekir Berker Turker, Zana Bucinca, Engin Erzin, Yücel Yemez, Metin Sezgin
Speech Driven Backchannel Generation using Deep Q-Network for Enhancing Engagement in Human-Robot Interaction
We present a novel method for training a social robot to generate backchannels during human-robot interaction. We address the problem within an off-policy reinforcement learning framework, and show how a robot may learn to produce non-verbal backchannels like laughs, when trained to maximize the engagement and attention of the user. A major contribution of this work is the formulation of the problem as a Markov decision process (MDP) with states defined by the speech activity of the user and rewards generated by quantified engagement levels. The problem that we address falls into the class of applications where unlimited interaction with the environment is not possible (our environment being a human) because it may be time-consuming, costly, impracticable or even dangerous in case a bad policy is executed. Therefore, we introduce deep Q-network (DQN) in a batch reinforcement learning framework, where an optimal policy is learned from a batch data collected using a more controlled policy. We suggest the use of human-to-human dyadic interaction datasets as a batch of trajectories to train an agent for engaging interactions. Our experiments demonstrate the potential of
our method to train a robot for engaging behaviors in an offline manner.
Key Words:
human-robot interaction, engagement, backchannels, reinforcement learning.
Authors: Nusrah Hussain, Engin Erzin, T. Metin Sezgin, and Yücel Yemez
Stroke-Based Sketched Symbol Reconstruction and Segmentation
Hand-drawn objects usually consist of multiple semantically meaningful parts. For example, a stick figure consists of a head, a torso, and pairs of legs and arms. Efficient and accurate identification of these subparts promises to significantly improve algorithms for stylization, deformation, morphing and animation of 2D drawings. In this paper, we propose a neural network model that segments symbols into stroke-level components. Our segmentation framework has two main elements: a fixed feature extractor and a Multilayer Perceptron (MLP) network that identifies a component based on the feature. As the feature extractor we utilize an encoder of a stroke-rnn, which is our newly proposed generative ariational Auto-Encoder (VAE) model that reconstructs symbols on a stroke by stroke basis. Experiments show that a single encoder could be reused for segmenting multiple categories of sketched symbols with negligible effects on segmentation accuracies. Our segmentation scores surpass existing methodologies on an available small state of the art dataset. Moreover, extensive evaluations on our newly annotated big dataset demonstrate that our framework obtains significantly better accuracies as compared to baseline models. We release the dataset to the community.
Key Words:
Sketches, Segmentation, Neural Networks
Authors: Kurmanbek Kaiyrbekov, T. Metin Sezgin
Generation of 3D Human Models and Animations Using Simple Sketches
Generating 3D models from 2D images or sketches is a widely studied important problem in computer graphics.We describe the first method to generate a 3D human model from a single sketched stick figure. In contrast to the existing human modeling techniques, our method requires neither a statistical body shape model nor a rigged 3D character model. We exploit Variational Autoencoders to develop a novel framework capable of transitioning from a simple 2D stick figure sketch, to a corresponding 3D human model. Our network learns the mapping between the input sketch and the output 3D model. Furthermore, our model learns the embedding space around these models. We demonstrate that our network can generate not only 3D models, but also 3D animations through interpolation and extrapolation in the learned embedding space. Extensive experiments show that our model learns to generate reasonable 3D models and animations
Key Words: Sketch-based shape modeling, deep learning, 2D sketches, 3D shapes, static and dynamic 3D human models.
Authors: Alican Akman, Yusuf Sahillioğlu, T. Metin Sezgin.
The ASC-Inclusion Perceptual Serious Gaming Platform for Autistic Children
‘Serious games’ are becoming extremely relevant to individuals who have specific needs, such as children with an Autism Spectrum Condition (ASC). Often, individuals with an ASC have difficulties in interpreting verbal and non-verbal communication cues during social interactions. The ASC-Inclusion EU-FP7 funded project aims to provide children who have an ASC with a platform to learn emotion expression and recognition, through play in the virtual world. In particular, the ASC-Inclusion platform focuses on the expression of emotion via facial, vocal, and bodily gestures. The platform combines multiple analysis tools, using on-board microphone and web-cam capabilities. The platform utilises these capabilities via training games, text-based communication, animations, video and audio clips. This paper introduces current findings and evaluations of the ASC-Inclusion platform and provides detailed description for the different modalities.
Index Terms—Autism Spectrum Condition, inclusion, virtualcomputerised environment, emotion recognition, AI in games.
Authors: Erik Marchi, Bjorn Schuller, Alice Baird, Simon Baron-Cohen, Amandine Lassalle, Helen O’Rielly, Delia Pigat, Peter Robinson, Ian Davies, Tadas Baltrusaitis, Ofer Golan, Shimrit Fridenson-Hayo, Shahar Tal, Shai Newman, Noga Meir-Goren, Antonio Camurri, Stefano Piana, Sven Bolte, Metin Sezgin, Nese Alyuz, Agnieszka Rynkiewicz, Aurelie Baranger
HapTable: An Interactive Tabletop Providing Online Haptic Feedback for Touch Gestures
We present HapTable; a multi–modal interactive tabletop that allows users to interact with digital images and objects through natural touch gestures, and receive visual and haptic feedback accordingly. In our system, hand pose is registered by an infrared camera and hand gestures are classified using a Support Vector Machine (SVM) classifier. To display a rich set of haptic effects for both static and dynamic gestures, we integrated electromechanical and electrostatic actuation techniques effectively on tabletop surface of HapTable, which is a surface capacitive touch screen. We attached four piezo patches to the edges of tabletop to display vibrotactile feedback for static gestures. For this purpose, the vibration response of the tabletop, in the form of frequency response functions (FRFs), was obtained by a laser Doppler vibrometer for 84 grid points on its surface. Using these FRFs, it is possible to display localized vibrotactile feedback on the surface for static gestures. For dynamic gestures, we utilize the electrostatic actuation technique to modulate the frictional forces between finger skin and tabletop surface by applying voltage to its conductive layer. To our knowledge, this hybrid haptic technology is one of a kind and has not been implemented or tested on a tabletop. It opens up new avenues for gesture–based haptic interaction not only on tabletop surfaces but also on touch surfaces used in mobile devices with potential applications in data visualization, user interfaces, games, entertainment, and education. Here, we present two examples of such applications, one for static and one for dynamic gestures, along with detailed user studies. In the first one, user detects the direction of a virtual flow, such as that of wind or water, by putting their hand on the tabletop surface and feeling a vibrotactile stimulus traveling underneath it. In the second example, user rotates a virtual knob on the tabletop surface to select an item from a menu while feeling the knob’s detents and resistance to rotation in the form of frictional haptic feedback.
Index Terms—Electrostatic actuation, gesture recognition, haptic interfaces, human–computer interaction, multimodal systems, vibrotactile haptic feedback
Authors:Senem Ezgi Emgin, Amirreza Aghakhani, T. Metin Sezgin, Cagatay Basdogan
Audio-Visual Prediction of Head-Nod and Turn-Taking Events in Dyadic Interactions
Head-nods and turn-taking both significantly contribute conversational dynamics in dyadic interactions. Timely prediction and use of these events is quite valuable for dialog management systems in human-robot interaction. In this study, we present an audio-visual prediction framework for the head-nod and turntaking events that can also be utilized in real-time systems. Prediction systems based on Support Vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTMRNN) are trained on human-human conversational data. Unimodal and multi-modal classification performances of head-nod and turn-taking events are reported over the IEMOCAP dataset.
Index Terms: head-nod, turn-taking, social signals, event prediction, dyadic conversations, human-robot interaction
Authors: B. B. Turker, E. Erzin, Y. Yemez and M. Sezgin
Multifaceted Engagement in Social Interaction with a Machine: the JOKER Project
This paper addresses the problem of evaluating engagement of the human participant by combining verbal and nonverbal behaviour along with contextual information. This study will be carried
out through four different corpora. Four different systems designed to explore essential and complementary aspects of the JOKER system in terms of paralinguistic/linguistic inputs were used for the data collection. An annotation scheme dedicated to the labeling of verbal and non-verbal behavior have been designed. From our experiment, engagement in HRI should be multifaceted.
Keywords-Human-Robot Interaction; Dataset; Engagement; Speech Recognition; Affective Computing
Authors: L. Devillers and S. Rosset and G. Dubuisson Duplessis and L. Bechade and Y. Yemez and B. B. Turker and M. Sezgin and E. Erzin and K. El Haddad and S. Dupont and P. Deleglise and Y. Esteve and C. Lailler and E. Gilmartin and N. Campbell
Gaze-based predictive user interfaces: Visualizing user intentions in the presence of uncertainty
Human eyes exhibit different characteristic patterns during different virtual interaction tasks such as moving a window, scrolling a piece of text, or maximizing an image. Human-computer studies literature contains examples of intelligent systems that can predict user’s task-related intentions and goals based on eye gaze behavior. However, these systems are generally evaluated in terms of prediction accuracy, and on previously collected offline interaction data. Little attention has been paid to creating real-time interactive systems using eye gaze and evaluating them in online use. We have five main contributions that address this gap from a variety of aspects. First, we present the first line of work that uses real-time feedback generated by a gaze-based probabilistic task prediction model to build an adaptive real-time visualization system. Our system is able to dynamically provide adaptive interventions that are informed by real-time user behavior data. Second, we propose two novel adaptive visualization approaches that take into account the presence of uncertainty in the outputs of prediction models. Third, we offer a personalization method to suggest which approach will be more suitable for each user in terms
of system performance (measured in terms of prediction accuracy). Personalization boosts system performance and provides users with the more optimal visualization approach (measured in terms of usability and perceived task load). Fourth, by means of a thorough usability study, we quantify the effects of the proposed visualization approaches and prediction errors on natural user behavior and the performance of the underlying prediction systems. Finally, this paper also demonstrates that our previously-published gaze-based task prediction system, which was assessed as successful in an offline test scenario, can also be successfully utilized in realistic online usage scenarios.
Keywords:
Implicit interaction, activity prediction, task prediction, uncertainty visualization, gaze-based interfaces, predictive interfaces, proactive interfaces, gaze-contingent interfaces, usability study
Authors: Çağla Çığ and T. M. Sezgin
Visualization Literacy at Elementary School
This work advances our understanding of children’s visualization literacy, and aims to improve it with a novel approach for teaching visualization at elementary schools. We first contribute an analysis of data graphics and activities employed in grade K to 4 educational materials, and the results of a survey conducted with 16 elementary school teachers. We find that visualization education could benefit from integrating pedagogical strategies for teaching abstract concepts with established interactive visualization techniques. Building on these insights, we develop and study design principles for novel interactive teaching material aimed at increasing children’s visualization literacy. We specifically contribute an online platform for teachers and students to respectively teach and learn about pictographs and bar charts and report on our initial observations of its use in grades K and 2.
Author Keywords: visualization literacy; qualitative analysis.
Authors: B. Alper, N. H. Riche, F. Chevalier, J. Boy and T. M. Sezgin
CHER-ish: A sketch- and image-based system for 3D representation and documentation of cultural heritage sites
Sketch Recognition with Few Examples
Sketch recognition is the task of converting hand-drawn digital ink into symbolic computer representations. Since the early days of sketch recognition, the bulk of the work in the field focused on building accurate recognition algorithms for specific domains, and well defined data sets. Recognition methods explored so far have been developed and evaluated using standard machine learning pipelines and have consequently been built over many simplifying assumptions. For example, existing frameworks assume the presence of a fixed set of symbol classes, and the availability of plenty of annotated examples. However, in practice, these assumptions do not hold. In reality, the designer of a sketch recognition system starts with no labeled data at all, and faces the burden of data annotation. In this work, we propose to alleviate the burden of annotation by building systems that can learn from very few labeled examples, and large amounts of unlabeled data. Our systems perform self-learning by automatically extending a very small set of labeled examples with new examples extracted from unlabeled sketches. The end result is a sufficiently large set of labeled training data, which can subsequently be used to train classifiers. We present four self-learning methods with varying levels of implementation difficulty and runtime complexities. One of these methods leverages contextual co-occurrence patterns to build verifiably more diverse set of training instances. Rigorous experiments with large sets of data demonstrate that this novel approach based on exploiting contextual information leads to significant leaps in recognition performance. As a side contribution, we also demonstrate the utility of bagging for sketch recognition in imbalanced data sets with few positive examples and many outlier
Authors: K. T. Yeşilbek, T. M. Sezgin.
Characterizing User Behavior for Speech and Sketch-based Video Retrieval Interfaces
From a user interaction perspective, speech and sketching make a good couple for describing motion. Speech allows easy specification of content, events and relationships, while sketching brings inspatial expressiveness. Yet, we have insufficient knowledge of how sketching and speech can be used for motion-based video retrieval, because there are no existing retrieval systems that support such interaction. In this paper, we describe a Wizard-of-Oz protocol and a set of tools that we have developed to engage users in a sketch and speech-based video retrieval task. We report how the tools and the protocol fit together using ”retrieval of soccer videos” as a use case scenario. Our software is highly customizable, and our protocol is easy to follow. We believe that together they will serve as a convenient and powerful duo for studying a wide range of multi-modal use cases.
Keywords: sketch-based interfaces, human-centered design, motion, multimedia retrieval
Authors: O. C. Altıok, T. M. Sezgin.
Analysis of Engagement and User Experience with a Laughter Responsive Social Robot
laughter responsive and laughter non-responsive. In responsive mode, the robot detects laughter using a multimodal real-time laughter detection module and invokes laughter as a backchannel to users accordingly. In non-responsive mode, robot has no utilization of detection, thus provides no feedback. In the experimental design, we use a straightforward question-answer based interaction scenario using a back-projected robot head. We evaluate the interactions with objective and subjective measurements of engagement and user experience.
laughter responsive, engagement.
CHI Paper receives Honorable Mention (top 5% of CHI papers)
Visualization Literacy at Elementary School, B. Alper, N. Riche, F. Chevalier, J. Boy, T. M. Sezgin.
Material Design in Augmented Reality with In-Situ Visual Feedback
Material design is the process by which artists or designers set the appearance properties of virtual surface to achieve a desired look. This process is often conducted in a virtual synthetic environment however, advances in computer vision tracking and interactive rendering now makes it possible to design materials in augmented reality (AR), rather than purely virtual synthetic, environments. However, how designing in an AR environment affects user behavior is unknown. To evaluate how work in a real environment influences the material design process, we propose a novel material design interface that allows designers
to interact with a tangible object as they specify appearance properties. The setup gives designers the opportunity to view the real-time rendering of appearance properties through a virtual reality setup as they manipulate the object. Our setup uses a camera to capture the physical surroundings of the designer to create subtle but realistic reflection effects on the virtual view superimposed on the tangible object. The effects are based on the physical lighting conditions of the actual design space. We describe a user study that compares the efficacy of our method to that of a traditional 3D virtual synthetic material design system. Both subjective feedback and quantitative analysis from our study suggest that the in-situ experience provided by our setup allows the creation of higher quality material properties and supports the sense of interaction and immersion.
Authors: W. Shi, Z. Wang, T. M. Sezgin, J Dorsey, H. Rushmeier.
Sketch-Based Articulated 3D Shape Retrieval
Sketch-based queries are a suitable and superior alternative to traditional text- and example-based queries for 3D shape retrieval. The authors developed an articulated 3D shape retrieval method that uses easy-toobtain 2D sketches. It does not require 3D example models to initiate queries but achieves accuracy comparable to a state-of-the-art example-based 3D shape retrieval method.
Authors: Y. Sahillioglu, T. M. Sezgin.
What Auto Completion Tells Us About Sketch Recognition
Auto completion is generally considered to be a difficult problem in sketch recognition as it requires a decision to be made with fewer strokes. Therefore, it is generally assumed that the classification of fully completed object sketches should yield higher accuracy rates. In this paper, we report results from a comprehensive study demonstrating that the first few strokes of an object are more important than the lastly drawn ones. Once the first few critical strokes of a symbol are observed, recognition accuracies reach a plateau and may even decline. This indicates that less is more in sketch recognition. Our results are supported by carefully designed computational experiments using Tirkaz et. al.’s sketch auto completion framework on the dataset of everyday object sketches collected by Eitz et. al
Authors: O. C. Altıok, K. T. Yesilbek, T. M. Sezgin.
Building a Gold Standard for Perceptual Sketch Similarity
Similarity is among the most basic concepts studied in psychology. Yet, there is no unique way of assessing similarity of two objects. In the sketch recognition domain, many tasks such as classification, detection or clustering require measuring the level of similarity between sketches. In this paper, we propose a carefully designed experiment setup to construct a gold standard for measuring the similarity of sketches. Our setup is based on table scaling, and allows efficient construction of a measure of similarity for large datasets containing hundreds of sketches in reasonable time scales. We report the results of an experiment involving a total of 9 unique assessors, and 8 groups of sketches, each containing 300 drawings. The results show high interrater agreement between the assessors, which makes the constructed gold standard trustworthy.
Authors: Serike Cakmak, T. Metin Sezgin.
Gaze-Based Biometric Authentication: Hand-Eye Coordination Patterns as a Biometric Trait
We propose a biometric authentication system for pointer-based systems including, but not limited to, increasingly prominent pen-based mobile devices. To unlock a mobile device equipped with our biometric authentication system, all the user needs to do is manipulate a virtual object presented on the device display. The user can select among a range of familiar manipulation tasks, namely drag, connect, maximize, minimize, and scroll. These simple tasks take around 2 seconds each and do not require any prior education or training [ÇS15]. More importantly, we have discovered that each user has a characteristic way of performing these tasks. Features that express these characteristics are hidden in the user’s accompanying hand-eye coordination, gaze, and pointer behaviors. For this reason, as the user performs any selected task, we collect his/her eye gaze and pointer movement data using an eye gaze tracker and a pointer-based input device (e.g. a pen, stylus, finger, mouse, joystick etc.), respectively. Then, we extract meaningful and distinguishing features from this ultimodal data to summarize the user’s characteristic way of performing the selected task. Finally, we authenticate the user through three layers of security: (1) user must have performed the manipulation task correctly (e.g. by drawing the correct pattern), (2) user’s hand-eye coordination and gaze behaviors while performing this task should confirm with his/her hand-eye coordination and gaze behavior model in the database, and (3) user’s pointer behavior while performing this task should confirm with his/her pointer behavior model in the database.
Authors: Cagla Cig, T. Metin Sezgin.
Semantic Sketch-Based Video Retrieval with Autocompletion
The IMOTION system is a content-based video search engine that provides fast and intuitive known item search in large video collections. User interaction consists mainly of sketching, which the system recognizes in real-time and makes suggestions based on both visual appearance of the sketch (what does the sketch look like in terms of colors, edge distribution, etc.) and semantic content (what object is the user sketching). The latter is enabled by a predictive sketch-based UI that identifies likely candidates for the sketched object via state-of-the-art sketch recognition techniques and offers on-screen completion suggestions. In this demo, we show how the sketch-based video retrieval of the IMOTION system is used in a collection of roughly 30,000 video shots. The system indexes collection data with over 30 visual features describing color, edge, motion, and semantic information. Resulting feature data is stored in ADAM, an efficient database system optimized for fast retrieval.
Authors: C. Tanase, I. Giangreco, L. Rossetto, H. Schuldt, O. Seddati, S. Dupont, O. C. Altıok, T. M. Sezgin
IMOTION – Searching for Video Sequences using Multi-Shot Sketch Queries
This paper presents the second version of the IMOTION system, a sketch-based video retrieval engine supporting multiple query paradigms. Ever since, IMOTION has supported the search for video sequences on the basis of still images, user-provided sketches, or the specification of motion via flow fields. For the second version, the functionality and the usability of the system have been improved. It now supports multiple input images (such as sketches or still frames) per query, as well as the specification of objects to be present within the target sequence. The results are either grouped by video or by sequence and the support for selective and collaborative retrieval has been improved. Special features have been added to encapsulate semantic similarity.
Authors: L. Rossetto, I. Giangreco, S. Heller, C. Tanase, H. Schuldt, O. Seddati, S. Dupont. T. M. Sezgin, O. C. Altıok, Y. Sahillioglu.
iAutoMotion – an Autonomous Content-based Video Retrieval Engine
This paper introduces iAutoMotion, an autonomous video retrieval system that requires only minimal user input. It is based on the video retrieval engine IMOTION. iAutoMotion uses a camera to capture the input for both visual and textual queries and performs query composition, retrieval, and result submission autonomously. For the visual tasks, it uses various visual features applied to the captured query images; for the textual tasks, it applies OCR and some basic natural language processing, combined with object recognition. As the iAutoMotion system does not conform to the VBS 2016 rules, it will participate as unofficial competitor and serve as a benchmark for the manually operated systems.
Authors: L. Rossetto, I. Giangreco, C. Tanase, H. Schuldt, O. Seddati, S. Dupont. T. M. Sezgin, Y. Sahillioglu.
Multimodal Data Collection of Human-Robot Humorous Interactions in the JOKER Project
Thanks to a remarkably great ability to show amusement and engagement, aughter is one of the most important social markers in human interactions. Laughing together can actually help to set up a positive atmosphere and favors the creation of new relationships. This paper presents a data collection of social interaction dialogs involving humor between a human participant and a robot. In this work, interaction scenarios have been designed in order to study social markers such as laughter. They have been implemented within two automatic systems developed in the JOKER project: a social dialog system using paralinguistic cues and a task-based dialog system using linguistic content. One of the major contributions of this work is to provide a context to study human laughter produced during a human-robot interaction. The collected data will be used to build a generic intelligent user interface which provides a multimodal dialog system with social communication skills including humor and other informal socially oriented behaviors. This system will emphasize the fusion of verbal and non-verbal channels for emotional and social behavior perception, interaction and generation capabilities.
Keywords: Multimodal Data, Human-Robot Interaction, Humorous Robot
Authors: L. Devillers, S. Rosset, G. Dubuisson Duplessis, M. A. Sehili, L. Béchade, A. Delaborde, C. Gossart, V. Letard, F. Yang, Y. Yemez, B. B. Türker, M. Sezgin, K. El Haddad, S. Dupont, D. Luzzati, Y. Estève, E. Gilmartin, N. Campbell.
Recent developments and results of ASC-Inclusion: An Integrated Internet-Based Environment for Social Inclusion of Children with Autism Spectrum Conditions
Individuals with Autism Spectrum Conditions (ASC) have marked difficulties using verbal and non-verbal communication for social interaction. The ASC-Inclusion project helps children with ASC by allowing them to learn how emotions can be expressed and recognised via playing games in a virtual world. The platform assists children with ASC to understand and express emotions through facial expressions, tone-of-voice and body gestures. In fact, the platform combines several stateof-the art technologies in one comprehensive virtual world, including analysis of users’ gestures, facial, and vocal expressions using standard microphone and web-cam, training through games, text communication with peers and smart agents, animation, video and audio clips. We present the recent findings and evaluations of such a serious game platform and provide results for the different modalities.
Authors: B. Schuller, E. Marchi, S. Baron-Cohen, A. Lassalle, H. O’Reilly, D. Pigat, P. Robinson, I. Davies, T. Baltrusaitis, M. Mahmoud, O. Golan, S. Fridenson, S. Tal, S. Newman, N. Meir, R. Shillo, A. Camurri, S. P., A. Staglianò, S. Bölte, D. Lundqvist, S. Berggren, A. Baranger, N. Sullings, T. M Sezgin, N. Alyuz, A. Rynkiewicz, K. Ptaszek, K. Ligmann.
Active learning for sketch recognition
The increasing availability of pen-based tablets, and pen-based interfaces opened the avenue for computer graphics applications that can utilize sketch recognition technologies for natural interaction. This has led to an increasing interest in sketch recognition algorithms within the computer graphics community. However, a key problem getting in the way of building accurate sketch recognizers has been the necessity of creating large amounts of annotated training data. Several authors have attempted to address this issue by creating synthetic data, or by building easy-to-use annotation tools. In this paper, we take a different approach, and demonstrate that the active learning technology can be used to reduce the amount of manual annotation required to achieve a target recognition accuracy. In particular, we show that by annotating few, but carefully selected examples, we can surpass accuracies achievable with equal number of arbitrarily selected examples. This work is the first comprehensive study on the use of active learning for sketch recognition. We present results of extensive analyses and show that the utility of active learning depends on a number of practical factors that require careful consideration. These factors include the choices of informativeness measures, batch selection strategies, seed size, and domain-specific factors such as feature representation and the choice of database. Our results imply that the Margin based informativeness measure consistently outperforms other measures. We also show that active learning brings definitive advantages in challenging databases when accompanied with powerful feature representations.
Authors: Erelcan Yanik, Tevfik Metin Sezgin.
HaptiStylus: A Novel Stylus Capable of Displaying Movement and Rotational Torque Effects
With the emergence of pen-enabled tablets and mobile devices, stylus-based interaction has
been receiving increasing attention. Unfortunately, styluses available in the market today are all passive
instruments that are primarily used for writing and pointing. In this paper, we describe a novel stylus
capable of displaying certain vibrotactile and inertial haptic effects to the user. Our stylus is equipped with two vibration actuators at the ends, which are used to create a tactile sensation of up and down movement along the stylus. The stylus is also embedded with a DC motor, which is used to create a sense of bidirectional rotational torque about the long axis of the pen. Through two psychophysical experiments, we show that, when driven with carefully selected timing and actuation patterns, our haptic stylus can convey movement and rotational torque information to the user. Results from a further psychophysical experiment provide insight on how the shape of the actuation patterns affects the perception of rotational torque. Finally, experimental results from our interactive pen-based game show that our haptic stylus is effective in practical settings.
Authors: Atakan Arasan, Cagatay Basdogan, T. Metin Sezgin
Identifying visual attributes for object recognition from text and taxonomy
Attributes of objects such as “square”, “metallic”, and“red” allow a way for humans to explain or discriminate object categories. These attributes also provide a useful intermediate representation for object recognition, including support for zero-shot learning from textual descriptions of object appearance. However, manual selection of relevant attributes among thousands of potential candidates is labor intensive. Hence, there is increasing interest in mining attributes for object recognition. In this paper, we introduce two novel techniques for nominating attributes and a method for assessing the suitability of candidate attributes for object recognition. The first technique for attribute nomination estimates attribute qualities based on their ability to discriminateobjectsatmultiplelevels ofthetaxonomy.Thesecondtechnique leveragesthelinguistic concept of distributional similarity to further refine the estimated qualities. Attribute nomination is followed by our attribute assessment procedure, which assesses the quality of the candidate attributes based on their performance in object recognition. Our evaluations demonstrate that both taxonomy and distributional similarity serveasuseful sourcesofinformation forattribute nomination,andourmethodscaneffectively exploit them. We use the mined attributes in supervised and zero-shot learning settings to show the utility of the selected attributes in object recognition. Our experimental results show that in the supervised case we can improve on a state of the art classifier while in the zero-shot scenario we make accurate predictions outperforming previous automated techniques.
Author: Caglar Tirkaz, Jacob Eisenstein, T. Metin Sezgin and Berrin Yanikoglu.
IMOTION — A Content-Based Video Retrieval Engine
This paper introduces the IMOTION system, a sketch-based video retrieval engine supporting multiple query paradigms. For vector space retrieval, the IMOTION system exploits a large variety of lowlevel image and video features, as well as high-level spatial and temporal features that can all be jointly used in any combination. In addition, it supports dedicated motion features to allow for the specification of motion within a video sequence. For query specification, the IMOTION system supports query-by-sketch interactions (users provide sketches of video frames), motion queries (users specify motion across frames via partial flow fields), query-by-example (based on images) and any combination of these, and provides support for relevance feedback.
Authors: Luca Rossetto, Ivan Giangreco, Heiko Schuldt, Stéphane Dupon, Omar Seddati, T. Metin Sezgin, Yusuf Sahillioglu
Real-Time Activity Prediction: A Gaze-Based Approach for Early Recognition of Pen-Based Interaction Tasks
Recently there has been a growing interest in sketch recognition technologies for facilitating human-computer interaction. Existing sketch recognition studies mainly focus on recognizing pre-defined symbols and gestures. However, just as there is a need for systems that can automatically recognize symbols and gestures, there is also a pressing need for systems that can automatically recognize pen-based manipulation activities (e.g. dragging, maximizing, minimizing, scrolling). There are two main challenges in classifying manipulation activities. First is the inherent lack of characteristic visual appearances of pen inputs that correspond to manipulation activities. Second is the necessity of real-time classification based upon the principle that users must receive immediate and appropriate visual feedback about the effects of their actions. In this paper (1) an existing activity prediction
system for pen-based devices is modified for real-time activity prediction and (2) an alternative time-based activity prediction system is introduced. Both systems use eye gaze movements that naturally accompany pen-based user interaction for activity classification. The results of our comprehensive experiments demonstrate that the newly developed alternative system is a more successful candidate (in terms of prediction accuracy and early prediction speed) than the existing system for real-time activity prediction. More specifically, midway through an activity, the alternative system reaches 66% of its maximum accuracy value (i.e. 66% of 70.34%) whereas the existing system reaches only 36% of its maximum accuracy value (i.e. 36% of 55.69%).
Authors: Cagla Cig and T. Metin Sezgin has been accepted for publication in Expressive 2015.
SVM-based Sketch Recognition: Which Hyperparameter Interval to Try?
Hyperparameters are among the most crucial factors that affect the performance of machine learning algorithms. In general, there is no direct method for determining a set of satisfactory parameters, so hyperparameter search needs to be conducted each time a model is to be trained. In this work, we analyze how similar hyperparameters perform across various datasets from the sketch recognition domain. Results show that hyperparameter search space can be reduced to a subspace despite differences in characteristics of datasets.
Authors: Kemal Tugrul Yesilbek, Cansu Sen, Serike Cakmak and T. Metin Sezgin has been accepted for publication in Expressive 2015.
Recognition of Haptic Interaction Patterns in Dyadic Joint Object Manipulation
The development of robots that can physically cooperate with humans has attained interest in the last decades. Obviously, this effort requires a deep understanding of the intrinsic properties of interaction. Up to now, many researchers have focused on inferring human intents in terms of intermediate or terminal goals in physical tasks. On the other hand, working side by side with people, an autonomous robot additionally needs to come up with in-depth information about underlying haptic interaction patterns that are typically encountered during human-human cooperation. However, to our knowledge, no study has yet focused on characterizing such detailed information. In this sense, this work is pioneering as an effort to gain deeper understanding of interaction patterns involving two or more humans in a physical task. We present a labeled human-human-interaction dataset, which captures the interaction of two humans, who collaboratively transport an object in an haptics-enabled virtual environment. In the light of information gained by studying this dataset, we propose that the actions of cooperating partners can be examined under three interaction types: In any cooperative task, the interacting humans either 1) work in harmony, 2) cope with conflicts, or 3) remain passive during interaction. In line with this conception, we present a taxonomy of human interaction patterns; then propose five different feature sets, comprising force-, velocity- and power-related information, for the classification of these patterns. Our evaluation shows that using a multi-class support vector machine (SVM) classifier, we can accomplish a correct classification rate of 86 percent for the identification of interaction patterns, an accuracy obtained by fusing a selected set of most informative features by Minimum Redundancy Maximum Relevance (mRMR) feature selection method.
Authors: Cigil Ece Madan, Ayse Kucukyilmaz, Tevfik Metin Sezgin, and Cagatay Basdogan has been accepted for publication in IEEE Transactions on Haptics.
Outstanding Young Scientist Award
Dr. Sezgin has received the Outstanding Young Scientist Award from the Turkish Academy of Sciences (TÜBA-GEBİP).
ASC Inclusion Final Meeting
Dr. Sezgin presented the Lab’s work on formative assessment at the The ASC-Inclusion final project review meeting in Luxembourg.
TUBITAK Award
Dr. Sezgin has been awarded a TUBITAK 1003 Grant. The grant will support research in development and evaluation of pen-based intelligent user interfaces for eLearning in the context of the FATIH initiative. Of hundreds of initial proposals submitted to the 1003 call, only 67 made it past the first round, and only nine were eventually funded at the end of the second round reviews.
New Publication in Computer Physics Communications
Sezgin&Sezgin‘s article on finding portable congruential random number generators has been accepted for publication in the Computer Physics Communications journal.
DPFrag: Trainable Stroke Fragmentation Based on Dynamic Programming
Many computer graphics applications must fragment freehand curves into sets of prespecified geometric primitives. For example, sketch recognition typically converts hand-drawn strokes into line and arc segments and then combines these primitives into meaningful symbols for recognizing drawings. However, current fragmentation methods’ shortcomings make them impractical. For example, they require manual tuning, require excessive computational resources, or produce suboptimal solutions that rely on local decisions. DPFrag is an efficient, globally optimal fragmentation method that learns segmentation parameters from data and produces fragmentations by combining primitive recognizers in a dynamic-programming framework. The fragmentation is fast and doesn’t require laborious and tedious parameter tuning. In experiments, it beat state-of-the-art methods on standard databases with only a handful of labeled examples.
Authors: R. Sinan Tümen and T. Metin Sezgin