Text and Graph Analytics

The Text and Graph Analytics (TGA) group at XRCI focuses on research and development of innovative solutions for analyzing large-scale unstructured text and graph data for solving real life business problems. The sharp rise in unstructured data over the recent years has been attributed to the ease of content generation through various communication channels and steady increase of connections among people leading to evolution of large scale network structures. TGA at XRCI sees this as an opportunity to solve real life business problems emerging from Big Text Data and Networks. The research group is exploring text and graph analytics challenges in verticals such as social media, customer care, healthcare and education.

Project Themes

  • Customer Care
  • Social Media Analytics
  • Healthcare
  • Education

Customer Care

Xerox offers customer service across verticals such as healthcare, transportation, and telecommunication over various channels such as telephone, SMS, chat, social networks, email etc. Xerox research is working on the future of these customer service i.e. introducing automated conversational agents to provide personalized customer care. The overarching goal of these efforts is to provide all customers with stellar customer experience. We are actively working on building the next generation Customer Experience Management framework that can automatically ensure balance between a brand's customer care architecture and better customer experience. The problems we work on are at the intersection of Natural Language Processing, Machine Learning and Cognitive science.

In customer care domain we are also working on a number of interesting graph modelling and (topological) graph analytics techniques. Graph modelling of customer care entities helps in capturing different aspects of direct and indirect interaction among entities like customers, products, brands, problems and communication channels. We have looked into customer modelling and churn prediction problems employing hypergraph modelling and associated multi-scale analysis on large graphs using spectral graph theory. The team is employing sophisticated data-driven techniques to automatically identify dissatisfaction drivers for churner in a large organization. More recently, the team has also worked on interesting variations of the influence maximization problem in large scale social networks with important applications in customer communications and marketing. We are also applying Graph modelling and analytics techniques in novel domains such as human resources for talent acquisition and intelligent hiring solutions as well as in cloud-based resource marketplaces.


Social Media Analytics

Social media reflects the opinion of masses on diverse issues. TGA team is working on social media analytics to strengthen social engagements, analyse customer interactions, opinions, and trends. Our team is involved in some great work managing and interpreting the noisy short text from social media and transforming this to useful business insights for business to become more efficient. Besides, developing novel techniques for generating insights from such noisy short text for specific business requirements, our team is diligently making efforts in enabling our solutions to be re-usable and adaptable to different scenarios with minimum overhead. Social media data is short lived (triggered by some event or incident) and having a faster turnaround time in generating actionable insights is the key. We are looking into transfer learning approaches for microblog categorization to adapt machine learning based categorization algorithm once trained for a scenario to be applied to new use cases with minimal or no labelled data from the latter (e.g transportation). Enabling capabilities to re-use existing knowledge to our solutions facilitates addressing a more real and observed scenarios where data for different applications is collected and analysis can be performed close to real time. Our team is also working to take this beyond social media to applications where analysing data in real time (as and when it arrives) is crucial such as healthcare and transportation. We are particularly interested to leverage massive social data and knowledge for the overall improvement of the urban infrastructure and increase the active participation of the citizens in planning the future of the city, with the aid of sophisticated text and graph modelling techniques. Urban infrastructure includes various aspects such as connectivity and transport, waste-management, water, energy, and environmental aspects. Further we also aim to understand the underlying social and behavioural dynamics of the netizens and how that can possibly affect the city planning process.


Given that considerable unmined valuable information is lying hidden in unstructured textual artefacts in health care, text analytics solutions are key to unlocking this information for use by various stakeholders in healthcare eco-system. The textual artefacts in health care are diverse including patient health care records, medical literature, and user generated content on the web. The language and vocabulary of the different textual artefacts are widely different, making concept unification across them difficult. This coupled with low/zero error tolerance required for clinical decision making in critical illness, makes it critical to come up with high quality, low error margin text analytics solutions for health care.
Our work on text analytics for health care encompasses research on building an information extraction platform by reconciling the different vocabularies present in HC text artefacts and building TA applications for consumer centric healthcare (CCH) and clinical informatics on top of it. In CCH, we are building a personalized actionable health information delivery system, tailored for the individual patient, from diverse real time data sources, enabling the patient to take action for wellness and participation in shared health decision making. In clinical informatics, we are building TA applications to aid clinical decision making, risk stratification for patient population and pharmacovigilance.



Assessing students' acquired knowledge is one of the key aspects of teachers' job. It is typically achieved by evaluating and scoring students responses in classroom assessments such as quizzes, examinations, and worksheets. However, assessment is a monotonous, repetitive and time consuming job and often seen as an overhead and non-rewarding. “Computer Assisted Assessment” has been prevalent in schools and colleges for many years now albeit for questions with constrained answers such as Multiple Choice Questions (MCQs). While assessment of answers to MCQs are easier for computers, they have been reported to suffer from multiple shortcomings compared to questions requiring free-text answers. In TGA, we are working on developing an automatic scoring system for open-ended questions which generate short free-text answers using techniques from NLP, large scale text mining and cognitive science.