The average office worker spends 2.6 hours a day reading and sending e-mails. Add in traditional news feeds and social media platforms such as Twitter, Facebook and LinkedIn, and it’s easy to see why so many of us feel ‘information overload’. Yet within the daily ebb and flow of information are millions of pieces of economically and culturally significant text-based data. The challenge lies in finding innovative ways to sort, analyze and gain insight on these data. As Canada Research Chair in Linguistic Information Visualization, Christopher Collins, PhD is addressing this challenge.
Dr. Collins and his research team are designing new visualization techniques for working with language data, whether on traditional desktop computers or on natural user interface technologies, such as large touch walls, smart phones, and gesture-based applications. His focus on interactive information graphics combined with automated language analysis tools, including the ability to classify documents by topic or detecting emotion in text, will open up huge possibility for data management.
Imagine a doctor using a gesture-based application to sort through large volumes of medical text with a simple swipe of a hand. Or a marketing professional using a digital whiteboard to determine what customers really think by sorting and exploring tweets based on emotion. Dr. Collins’ research vision is aimed at making scenarios like this a reality.
Through his research, Dr. Collins and his team are forging important connections with business and academic communities. His work demonstrates that the ability to manage text-based data by getting at people’s deeper insights can offer a major competitive advantage.
Industry Expertise (4)
Areas of Expertise (11)
UOIT Research Excellence Award (Early Career) (professional)
UOIT's Research Excellence Awards recognize faculty who have achieved national and/or international success and recognition through their research activities and enhanced UOIT’s reputation as a research-focused institution.
University of Toronto: PhD, Computer Science, Knowledge Media Design 2010
Collaborated with Innovis group at the University of Calgary, the Visual Communications Lab at IBM's TJ Watson Research Laboratory and recently a visiting research at Stanford University's Department of Computer Science
NSERC PGS Scholarship, Recipient of the Adel S. Sedra Distinguished Graduate Award (University of Toronto Award of Excellence)
University of Toronto: MSc, Computer Science 2004
President, Graduate Students' Union, Member, University Affairs Board of Governing Council
Memorial University of Newfoundland : BSc, Computer Science and Chemistry 2001
- Association for Computational Linguistics
- Association for Computing Machinery (ACM)
- ACM Special Interest Group on Computer–Human Interaction
- Computer Linguistics Group, University of Toronto
- Institute of Electrical and Electronics Engineers (IEEE)
- IEEE Computer Society
- IEEE Visualization and Graphics Technical Committee
- Innovations in Visualization Laboratory, University of Calgary
- Knowledge Media Design Institute, University of Toronto
- T. J. Watson Research Center, IBM Research
Media Appearances (6)
UOIT scientists and engineers receive new federal research funding
Fourteen researchers and two graduate students at the University of Ontario Institute of Technology (UOIT) will receive new federal funding for innovative research and scientific discovery, UOIT President Tim McTiernan announced today.
The funding is part of more than $430 million in new research funding unveiled June 22 by the Natural Sciences and Engineering Research Council of Canada (NSERC) through its Discovery Grants program. The new grants will support thousands of top researchers and students at more than 70 Canadian universities.
UOIT’s Dr. Christopher Collins helping Canadians better manage data overload
From tweets and emails to newspapers, billions of pieces of economically and culturally important linguistic data are generated every day around the world. Faced with such an overwhelming volume of textual and visual data, many Canadians – individuals and organizations alike – are struggling with the issue of information overload.
UOIT awarded new Canada Research Chair in Linguistic Information Visualization
The University of Ontario Institute of Technology’s (UOIT) research portfolio is expanding with the announcement of a new Canada Research Chair (CRC) to Dr. Christopher Collins, Assistant Professor, Faculty of Science; and the renewal of another CRC, first awarded in 2009 to Dr. Dan Zhang, Professor, Faculty of Engineering and Applied Science.
Heartbleed update: UOIT researchers analyze why consumers use weak passwords
Internet security is front and centre in the wake of recent news stories about the Heartbleed Internet bug. The online bug has exposed a potential encryption vulnerability and prompted some websites to temporarily shut down, including Canada Revenue Agency’s (CRA) tax filing system. The CRA reported on April 14 about 900 social insurance numbers were stolen by the bug.
UOIT Assistant Professors discuss password safety with local media
In the wake of the recent Heartbleed virus, which caused a number of websites to shut down due to password security risks, two University of Ontario Institute of Technology (UOIT) faculty members recently discussed their study on password safety with Metroland Media.
You are looking at an open book
The Star online
Books are wonderful things, but they tend to release information rather slowly. Bound ink and paper remains stubbornly linear, as sentences unspool across the page in an orderly but time-consuming fashion.
Those seeking a shortcut often rely on book reviews (and, as it so happens, this paper prints many excellent ones each week). But if you're an academic or a lawyer or anyone else with a specialized area of interest, reviews are unlikely to address your info niche. The solution to assessing unfamiliar books or articles might reside in the colourful pinwheels pictured here.
Topic modeling algorithms are widely used to analyze the thematic composition of text corpora but remain difficult to interpret and adjust. Addressing these limitations, we present a modular visual analytics framework, tackling the understandability and adaptability of topic models through a user-driven reinforcement learning process which does not require a deep understanding of the underlying topic modeling algorithms. Given a document corpus, our approach initializes two algorithm configurations based on a parameter space analysis that enhances document separability. We abstract the model complexity in an interactive visual workspace for exploring the automatic matching results of two models, investigating topic summaries, analyzing parameter distributions, and reviewing documents. The main contribution of our work is an iterative decision-making technique in which users provide a document-based relevance feedback that allows the framework to converge to a user-endorsed topic distribution. We also report feedback from a two-stage study which shows that our technique results in topic model quality improvements on two independent measures.
In the domain of literary criticism, many critics practice close reading, annotating by hand while performing a detailed analysis of a single text. Often this process employs the use of external resources to aid analysis. In this article, we present a study and subsequent tool design focused on leveraging a critic’s annotations as implicit interactions for initiating context-specific computational support that automatically searches external resources. We observed 14 poetry critics performing a close reading, revealing a set of cognitive practices supported through free-form annotation that have not previously been discussed in this context. We used guidelines derived from our study to design a tool, Metatation, which uses a pen-and-paper system with a peripheral display to utilize reader annotations as underspecified interactions to augment close reading. By turning paper-based annotations into implicit queries, Metatation provides relevant supplemental information in a just-in-time manner and acts as a bridge between close and distant reading.
Many visualizations, including word clouds, cartographic labels, and word trees, encode data within the sizes of fonts. While font size can be an intuitive dimension for the viewer, using it as an encoding can introduce factors that may bias the perception of the underlying values. Viewers might conflate the size of a word's font with a word's length, the number of letters it contains, or with the larger or smaller heights of particular characters (‘o’ vs. ‘p’ vs. ‘b’). We present a collection of empirical studies showing that such factors-which are irrelevant to the encoded values-can indeed influence comparative judgements of font size, though less than conventional wisdom might suggest. We highlight the largest potential biases, and describe a strategy to mitigate them.
We present NEREx, an interactive visual analytics approach for the exploratory analysis of verbatim conversational transcripts. By revealing different perspectives on multi‐party conversations, NEREx gives an entry point for the analysis through high‐level overviews and provides mechanisms to form and verify hypotheses through linked detail‐views. Using a tailored named‐entity extraction, we abstract important entities into ten categories and extract their relations with a distance‐restricted entity‐relationship model. This model complies with the often ungrammatical structure of verbatim transcripts, relating two entities if they are present in the same sentence within a small distance window.
Heterogeneous multi-dimensional data are now sufficiently common that they can be referred to as ubiquitous. The most frequent approach to visualizing these data has been to propose new visualizations for representing these data. These new solutions are often inventive but tend to be unfamiliar. We take a different approach. We explore the possibility of extending well-known and familiar visualizations through including Heterogeneous Embedded Data Attributes (HEDA) in order to make familiar visualizations more powerful. We demonstrate how HEDA is a generic, interactive visualization component that can extend common visualization techniques while respecting the structure of the familiar layout. HEDA is a tabular visualization building block that enables individuals to visually observe, explore, and query their familiar visualizations through manipulation of embedded multivariate data. We describe the design space of HEDA by exploring its application to familiar visualizations in the D3 gallery. We characterize these familiar visualizations by the extent to which HEDA can facilitate data queries based on attribute reordering.
In this paper we examine how the Minimum Description Length (MDL) principle can be used to efficiently select aggregated views of hierarchical datasets that feature a good balance between clutter and information. We present MDL formulae for generating uneven tree cuts tailored to treemap and sunburst diagrams, taking into account the available display space and information content of the data. We present the results of a proof-of-concept implementation. In addition, we demonstrate how such tree cuts can be used to enhance drill-down interaction in hierarchical visualizations by implementing our approach in an existing visualization tool. Validation is done with the feature congestion measure of clutter in views of a subset of the current DMOZ web directory, which contains nearly half million categories. The results show that MDL views achieve near constant clutter level across display resolutions. We also present the results of a crowdsourced user study where participants were asked to find targets in views of DMOZ generated by our approach and a set of baseline aggregation methods. The results suggest that, in some conditions, participants are able to locate targets (in particular, outliers) faster using the proposed approach.
We introduce a novel visual analytics approach to analyze speaker behavior patterns in multi-party conversations. We propose Topic-Space Views to track the movement of speakers across the thematic landscape of a conversation. Our tool is designed to assist political science scholars in exploring the dynamics of a conversation over time to generate and prove hypotheses about speaker interactions and behavior patterns. Moreover, we introduce a glyph-based representation for each speaker turn based on linguistic and statistical cues to abstract relevant text features. We present animated views for exploring the general behavior and interactions of speakers over time and interactive steady visualizations for the detailed analysis of a selection of speakers. Using a visual sedimentation metaphor we enable the analysts to track subtle changes in the flow of a conversation over time while keeping an overview of all past speaker turns. We evaluate our approach on real-world datasets and the results have been insightful to our domain experts.
In this work, we introduce a novel visualization technique, the Temporal Intensity Map, which visually integrates data values over time to reveal the frequency, duration, and timing of significant features in streaming data. We combine the Temporal Intensity Map with several coordinated visualizations of detected events in data streams to create PhysioEx, a visual dashboard for multiple heterogeneous data streams. We have applied PhysioEx in a design study in the field of neonatal medicine, to support clinical researchers exploring physiologic data streams. We evaluated our method through consultations with domain experts. Results show that our tool provides deep insight capabilities, supports hypothesis generation, and can be well integrated into the workflow of clinical researchers.
We present FluxFlow, an interactive visual analysis system for revealing and analyzing anomalous information spreading in social media. Everyday, millions of messages are created, commented, and shared by people on social media websites, such as Twitter and Facebook. This provides valuable data for researchers and practitioners in many application domains, such as marketing, to inform decision-making. Distilling valuable social signals from the huge crowd's messages, however, is challenging, due to the heterogeneous and dynamic crowd behaviors. The challenge is rooted in data analysts' capability of discerning the anomalous information behaviors, such as the spreading of rumors or misinformation, from the rest that are more conventional patterns, such as popular topics and newsworthy events, in a timely fashion. FluxFlow incorporates advanced machine learning algorithms to detect anomalies, and offers a set of novel visualization designs for presenting the detected threads for deeper analysis. We evaluated FluxFlow with real datasets containing the Twitter feeds captured during significant events such as Hurricane Sandy.
We introduce a new direct manipulation technique, DimpVis, for interacting with visual items in information visualizations to enable exploration of the time dimension. DimpVis is guided by visual hint paths which indicate how a selected data item changes through the time dimension in a visualization. Temporal navigation is controlled by manipulating any data item along its hint path. All other items are updated to reflect the new time. We demonstrate how the DimpVis technique can be designed to directly manipulate position, colour, and size in familiar visualizations such as bar charts and scatter plots, as a means for temporal navigation. We present results from a comparative evaluation, showing that the DimpVis technique was subjectively preferred and quantitatively competitive with the traditional time slider, and significantly faster than small multiples for a variety of tasks.
While multi-touch computing becomes more common, there comes a requirement for students to learn how to create software for multi-touch environments. Although, there are many powerful toolkits that exist already, they require a strong programming background and thus become difficult to integrate into fast-paced human-computer interaction (HCI) courses or for non-CS students to use. Researchers at the University of Ontario Institute of Technology (UOIT) and the University of Waterloo (UW) have developed a toolkit with a simplified API called the Simple Multi-Touch Toolkit (SMT).