September 11, 2017

A Cognitive Visual Storyteller

By In Uncategorized

Each one of us is a storyteller.

The stories we tell are the ones that are born out of our experiences and our interpretation of the world that we see around us. The stories we tell become bigger stories when we engage in conversation with those around us. Conversations are useful as they form the journeys that we undertake towards specific goals. A very large part of these journeys however, remain as memories as we dont always record every moment, every conversation, every part of our journey. Then there are those that do tell their stories like poets, authors, journalists, filmmakers, photographers, artists, musicians and many more, using the tools and the expertise that they have built up.  Most of us, however, have to be content with the conversations that we have. And hope that we can remember at a later date the things we said today.

But consider this. There is available today, thanks to the advances made by technology in the last few years, the ubiquitous smartphone. That wonderful device which apart from allowing us to make and receive calls, also allows us to record our world around us. And given that most of us make extensive use of our phones to take pictures, we do have enough material by way of images on our phones that can be used to tell wonderful stories about the world around us. The only thing stopping us is that little bit of effort that we need to put in. The effort to put some images together along with supporting words and tell coherent visual stories. And since it is an effort, what most of us end up doing, is simply uploading our photographs on to social media platforms of our choice.

Going a bit further, now consider this. What if our device could create a meaningful visual story for us automatically with the photographs we have on our phones, and allow us to put in the finishing touches before sharing it with our friends? This is not wishful thinking. We have the technologies today to be able to do this!

What follows below is an idea that I have on how this could be done. This is purely conceptual and is a possible direction for developing an application, and is not the solution itself.

My idea considers two specific technologies in addition to the mobile platform and the related services that are required to build mobile apps.

The first one is Visual Recognition.

Visual Recognition is the ability to find meaning in visual content. It allows us to analyze images for scenes, objects, faces, and other content. We can then create custom classifiers and tag images accordingly. This would allow locating similar images within a collection. Using this service we can develop smart applications that analyze the visual content of images or video frames to understand what is happening in a scene. Visual Recognition uses deep learning algorithms to analyze images that can give insights into visual content. Using this, we can organize image libraries, understand an individual image, recognize food, detect faces, and create custom classifiers for specific results that we would need.

Cognitive services of the IBM Watson Services on IBM’s Bluemix Platform can be considered. One such service is a Visual Recognition service which is available on the Bluemix Platform. This service has a General Classification feature using which we can generate class keywords that describe the image. We can use our own images, or extract relevant image URLs from publicly accessible image sources for analysis. There is a Face Detection feature which detects human faces in the image. This service also provides a general indication of age range and gender of faces. The Visual Training feature allows for the creation of custom, unique visual classifiers. We can use the service to recognize custom visual concepts that are not available with general classification. Using this models can be trained and then incorporated into the apps.

Shown below is the process of creating and using the classifier:

This will allow us to train the models that can be imported into the app.

Here is video on the IBM Watson Cognitive Visual Recogntion Service.

The second service which I would use is the Watson Conversation service. This will allow us to add a natural language interface to the application to automate interactions with the end users. We can train the Watson Conversation service through an easy-to-use web application, designed so that we can quickly build natural conversation flows between the app and the users.

This would allow users to specify using simple language the parameters around which they would like to build their visual stories. The parameters could be location, events, dates and others which could fit into building a story around a theme.

Shown below is a video which is an overview of the IBM Watson Conversation Service.

To see how this app would be used in the real world, let us consider this scenario. Lets say that I am on a holiday in the mountains somewhere. I have my iPhone with me and i use it to take many pictures of the various places that I visit. I spend the day just visiting the various sights and taking photographs.

My phone  has the ability to tag these photographs by location. So the images are already tagged by location and by date.

At the end of the holiday, or maybe at the end of each day, I would like to build a visual story using some of these images and share it with my friends.

What most people do, including me most times, is to upload some of these photographs to social media sites like Facebook, Instagram, Twitter with or without captions and share them with our friends. The other option is to upload the entire lot to photo sharing sites like Flickr or Google Photos and then create an album and share the link with our friends. This is an album and the best you can do is create separate albums by theme and hope others can see the underlying storyline (if at all you intend for one to be there).


But now what i can do, given the option of using the concept app that I am talking about, is that I can actually build a visual story automatically with a specific theme.

I open up my visual storytelling app and it gives me some suggestions for themes around the current location that I am in. This could be based on the photographs that I have in my phone (and maybe public photographs for that location that are available on the net as well).

What if I decide I want to build my own theme? So I start up a conversation with my app specifying that I would like to build a story around  life in the mountains. During the conversation the app finds out more that would help it to identify the right images that can be used to build the story. Once I am satisfied that what I have in mind has been understood by the app I ask it to go ahead and locate all the images that can be used.


The app can then identify the relevant images based on the understanding it has from the conversation and  from the visual recognition algorithms and the classifiers that have been built into the app. I would be presented with a selection of images around the theme with some basic captions.

I could then make changes to the captions, or the order of the images, give the story a title, maybe add or delete images to enhance the basic story, and once I am satisfied I could then publish the story. Once it is published I could share it with others on social media and my followers.

For those who would want to manually create their own stories the app would allow them to select images manually and caption them and lay them out visually as desired. This is for those who like to use their own creativity instead of relying on machine intellience!


The uniqueness of this app would be the fact that using cognitive capabilities the app would understand the conversation and arrive at the theme that the user is looking for and then the visual recognition cognitive abilities built into the app would identify the right images and help build the visual stories.

This would help everyone create meaningful visual stories using resources that are readily available to them.

One would be the phone they already have and the capability of taking photographs using it. The other part would be the cognitive app that will identify the theme and the relevant images that can be used to build a meaningful visual story.

Together, this cognitive visual storyteller app could easily turn everyone into successful visual storytellers!

1 Comment
  1. Pankaj Verma September 11, 2017

    Very good explanation of how Cognitive capability can be used for storytelling.


Leave a Comment