Abstract— lose focus on the nitty-gritty of the

Abstract— Learning creates the ability to expound your mind and  it  is  the  minimum  requirement  of  success  in  life.  The erudite people read an average of 2-3 hours a day. Continuous learning can create new insights and perspectives in life.  Efficient method of learning makes it possible to attract even  non-readers.  As  mentioned,  efficient  method  of  learning  be made up out of representations like snippets, maps and graphs. It involves extracting any type of content such as newspaper, e-book,  etc  and  compress  it  to  bring  forth  pictorial representation . It inculcates reformative education, saves time during the analysis of surveys and feedback analysis. Index Terms— Sentiment analysis, Visual representation, pictographic memory, Feedback, Query Analysis, Survey. I. INTRODUCTIONOne of the major problems associated with learning is the protracted essays and stories that makes the reader lose focus on the nitty-gritty of the magnum opus. In August 2013 ,Book trust commissioned DJS Research, an independent research company ,to carry out a quantitative research project investigating reading habits and attitudes of adults in England, and to examine the relationship between reading habits, attitudes to reading and demographic factors such as socio-economic group, age and gender. It is always easy for a person to remember what he sees than what he listens to. The pictographic memory makes it convenient for the user to observe, understand and learn concepts. This is the reason why people tend to remember the story of the movie they watched months ago in a better way than the essay they read today. Figure1: Represents that understanding the things visually is easy than words Recently, deep convolutional and recurrent networks for text have yielded highly discriminative and generalizable (in the zero-shot learning sense) text representations learned automatically from words and characters (Reed et al., 2016) 2-4. These approaches exceed the previous state of-the-art using attributes for zero-shot visual recognition on the Caltech-UCSD birds database (Wah et al., 2011), and also are capable of zero-shot caption-based retrieval. With these upgrades, it is possible to map a text sentence to its corresponding image in the hash table. The conversion of entire text document or voice note to its corresponding visual representation in the form of images, graphs and charts will enable learning in a better way1. II.LITERATURE SURVEYDetecting text based image with optical character recognition for English translation and speech using Android, application that allows smart phones to capture an image and extract the text from it to translate into English and speech it out is no longer a dream. In this study, an Android application is developed by integrating Tesseract OCR engine, Bing translator and phones’ built-in speech out technology. Final deliverable is tested by various type of target end user from a different language background and concluded that the application benefits many users.          Figure 2: The circle represents the content which is gathered from the television internet and radio. Automatic detection and translation of text from natural scenes, Example based machine translation technology for sign translation and present a prototype system for Chinese sign translation. This system is capable of capturing images, automatically detecting and recognizing text, and translating the text into English. The translation can be displayed on a palm size PDA, or synthesized as a voice output message over the earphones5.Universal syllable tokeniser for language identification3, this paper, we have compared two such LID systems namely Gaussian Mixture Model (GMM) tokeniser and syllable based LID system. The phonotactics of GMM and syllable based system are captured by GMM cluster indices and syllable tokens respectively. We propose the use of universal syllable models in building the LID systems and then deriving the uni-gram syllable statistics from this model. Experimental results on the OGI 1992 multilingual speech corpus show that syllable based LID system performs significantly better than the GMM Tokeniser system.Unsupervised image translation , Bayesian technique for inferring the most likely output image. The prior on the output image P(X) is a patch-based Markov random field obtained from the source image. The likelihood of the input P(Y/splbsol/X) is a Bayesian network that can represent different rendering styles7. We describe a computationally efficient, probabilistic inference and learning algorithm for inferring the most likely output image and learning the rendering style. We also show that current techniques for image restoration or reconstruction proposed in the vision literature and image-based non photorealistic rendering could be seen as special cases of our model. III. PROPOSED SYSTEMThe system makes use of 2 modules, Generator module and the Discriminator module. The Generator module is used to convert the received input message/voice to specified standard syntax. The output of the Generator module is given to the Discriminator module which is used to classify the given input and map the text to an index referring to the corresponding visual (maybe an image, graph or chart). This process is carried out in a feed-forward manner and thus generate a sequence of required visuals. Fig: Feed forward text(xn) to Image(yn) Mapping Fig3: Working of Generator and discriminator IV. IMPLEMENTATIONLanguage Translator: Multiple languages around the world require different character representations. Fortunately, all characters can be encoded into UTF-8 Unicode. UTF-8 is a variable byte sized encoding scheme that can represent up to 4 bytes or 4,294,967,296 characters and is the most widely used encoding scheme for Web pages. Additional character sets are used on web sites. In the Introduction, Figure 1 shows a list of common character set encoding schemes found on the Internet. Language translation software prefers UTF-8 encoding and text should be converted into UTF-8 prior to translation5. The encoding scheme being used may be detected by discovering the Web page’s character set or encoding declaration. Most Web pages can easily be converted to UTF-8 using a Python library called Beautiful Soup. Addition programming is required for pages with missing character set declarations. Using publically available software over the Internet, here are the steps necessary to translate a Website from a foreign language into English. Find the Web site and download the HTML content. Remove the unnecessary HTML, script tags, and excess white space. If necessary, 43 capture the text and convert it into UTF-84. Fig4: Language translator design flowTokeniser: The tokeniser receives stream of characters, breaks it up into individual tokens. Tokens here refer to each word in the content. This is done by removing punctuation marks from the reviews, removing stop words from the tokenised words and performing spell  check  on  the filtered words by performing Edit Distance(with distance 2). Edit distance is a way of quantifying how dissimilar two strings are to one another by counting the minimum number of operations required to transform one string into the other14.The major elements in this approach are creating the syllable segmentation algorithm, universal syllable set and representative syllable model for all the languages. To create the set of universal syllables, we randomly select n syllables (s1, s2, s3, …..,sn ) from each of the languages. We assume that these syllables encompasses the frequently occurring syllables in the languages15. These syllables are then clustered using the syllable clustering algorithm. The clustering process will result in ‘L’ clusters which we refer to as the set of universal syllable modelsClassifier: The term “classifier” refers to the mathematical function, implemented by a classification algorithm , that maps input data to a category. How did we evaluate?Step1: Have built different classifiers by adopting different machine learning algorithms like K-nearest neighbors, Decision Trees, SVM, and Naïve bayes by changing the parameters for each classifier1013.NounI had to light the candleVerbPlease on the lightsAdjectiveI had a light lunchTable 1. Multiple meanings of the word light.Step 2: Evaluated the performance of the classifiers using different generators.A HoldOut: 10-fold cross validation: It is a technique to evaluate the predictive models by partitioning the original samples into set of 9 training samples to train the models and a test set to evaluate it12.B Resample: Generator: Ran these generators for10 runs and had done reshuffling of data on each run. This ensures the different order of inputs into the feed forward system.C Sentiment Analysis: It is the process of extracting or identifying the subject through natural language processing .It aims to determine the attitude of the end user and the way that user understands the content11.SVM Classifier12,SVM Classifier uses large margin for classification. It separates the tweets using a hyper plane. SVM uses the a discriminative function defined as,g(X) = wT ?(X) + b’X’ is the feature vector, ‘w’ is the weights vector and ‘b’ is the bias vector. ?() is the non linear mapping from input space to high dimensional feature space. ‘w’ and ‘b’ are learned automatically on the training set.Maximum Entropy Classifier11, In Maximum Entropy Classifier, no assumptions are taken regarding the relationship between features. This classifier always tries to maximize the entropy of the system by estimating the conditional distribution of the class label. The conditional distribution is defined as PX(y|X) = 1/Z(X)exp {??ifi(X, y) } ‘X’ is the feature vector and ‘y’ is the class label. Z(X) is the normalization factor and ?i is the weight coefficient. fi(X,y) is the feature function which is defined as fi(X, y) =( 1, X=xi and y = yi0, otherwise).Text to Image Translation: Deep convolutional generative adversarial networks have begun to generate highly compelling images of specific categories, such as faces, album covers, and room interiors. To solve this challenging problem requires solving two sub problems: first, learn a text feature representation that captures the important visual details; and second, use these to Image Synthesis to synthesize a compelling image that a human might mistake for real1.In real world applications, however, images rarely appear in isolation as they are often accompanied by unstructured textual descriptions, such as on web pages and in books. The additional information from these descriptions could be used to simplify the image modeling task. Fig 5:Text to image translation workflowSentence interpolation: Although there is no ground-truth text for the intervening points, the generated images appear plausible. Since we keep the noise distribution the same, the only changing factor within each row is the text embedding that we use. Note that interpolations can accurately reflect color information, such as a bird changing from blue to red while the pose and background are invariant. Here, we sample two random noise vectors. By keeping the text encoding fixed, we interpolate between these two noise vectors and generate bird images with a smooth transition between two styles by keeping the content fixed9.Algorithm Used to produce Visuals from Text:The algorithm makes use of recursion concept to implement the feed forward of text to image mapping after every successful mapping until the entire content is mapped to its corresponding visual136.Input: mini batch images x, matching text t, mismatching tˆ, number of training batch steps SAlgorithm: func(D,G) {Discriminator & Generator Networks}Step 1h ? ?(t) {Encode matching text description}  Step 2hˆ ? ?(tˆ) {Encode mis-matching text description}  Step 3z ? N (0, 1)z {Draw sample of random noise}straight ? D(x, h) {real image, right text} Step 4left ? D(x, hˆ) {real image, wrong text}Step 5right ? D(ˆx, h) {fake image, right text}Step 6LD ? log(straight) + (log(1 ? left) + log(1 ? right ))/2LG ? log(sf ) Step 7func(D?D ? ?? LD/?D, G ?G? ??LG/?G)   //recursive feed forward for next image  Table 2: table shows the steps to implement the Discriminator and Generator networks.  Fig 6. ROC curves using cosine distance between predicted style vector on same vs. different style image pairs. Left: image pairs reflect same or different pose. Right: image pairs reflect same or different average background color. V.SCOPENewspapers & Social Media Content, The news from various sources like television, the additional information about the incident that is posted on the internet and the radio are merged. This information is compressed using compression algorithm and made as a summary that can be visualized. Crime Case Investigation, In case of any crime case the police can get the visualization of the entire crime scene by arranging the evidence they got at the crime spot. Figure 7: Data to information mappingTemplate & Content, This  technique  gives  the  developers  a template to create interactive   data   and   story   lines   quickly.   Relationships between various objects, methods and their interactions in a realistic way. By doing so,it provides a focused, interactive and more concise reading experience for the end reader.Learning, This can be used while reading story book contents. The whole story scenario is given as an input and the story is analyzed using sentiment analysis. The analyzed data is compressed and represented in pictorial form. For instance if the story revolves around some place say restaurant then the real time experience of the exact restaurant is shown visually to the end user8.   Process Summary:The input message is first run through the language translator. The translated content is then run through a tokeniser and spellchecker. The tokenised contents are run through the classifier to express the right insight of the message ,finally the objects are mapped to corresponding visual representation connected using the connectives defined in the tokeniser. The final image results are mapped to its corresponding data using the hashmap table. Doesn’t require more than uploading two spreadsheets of data through a customized  uploader: one that focuses on the  data  while the other on the storytelling. They can easily upload their dataset and link their narrative to certain data points. The data visualization changes as the readerscrolls through the article which essentially transform the reading experience. Fig7: The work flow of data driven story telling VI. CONCLUSIONIn this work we developed a simple and effective model for generating images based on detailed visual descriptions to take perspective of seeing things to a new level and make the process of learning simple. We express that the model can synthesize many plausible visual interpretations of a given text caption. Then match it with data in the visual descriptions using sentence interpolation to increase the accuracy of the model. Our manifold interpolation regularizer substantially improved the text to image synthesis .The model can be generalized based on of our approach to generating images with multiple objects and variable backgrounds.