The Three Vs of
Fintech
This section explains how the Fintech industry will function and the key areas to focus on when a Fintech uses any kind of Artificial
Intelligence or Machine Learning. The three V’s or significant factors to be kept in mind are Voice, Vernacular and Video. Going forward, we have explained everything in detail.
Table of Content
- Future of Fintech
- Voice
- Video
- Vernacular
Future of Fintech
Understanding the factors influencing the Fintech business is crucial when discussing its future. While there have been numerous
advancements in the past concerning loans, payment gateways and insurance products, we must keep the three V’s in mind while discussing some common characteristics that will drive the Fintech business. We have always heard or learned that V stands for victory, but now the time has changed, especially in the fintech world, where V means Voice; V means Video and V means Vernacular.
These are the three Vs that will define the fintech segment in the future. While we have already seen many use cases for voice and videos, we have yet to see extensive use of the language or vernacular in the Fintech space. We should look at this more from the point of view of a country like India, which is very diverse in spoken and written languages. Keeping in mind that the Fintech industry will serve the bottom of the Pyramid or the unserved population in the country, it must look for a better communication system driven through these three Vs.
Voice
The time has gone for sending text messages or emails. People like
to interact by talking to save time and show their expressions. Here,
I do not mean an increase in human-to-human interaction, but in my view, going forward, people will talk to computers or voice bots. Nowadays, we have chatbots, where we can chat with automated machines in the future, we will likely speak to the bots about all the queries related to product information, services offered or specific account information. These Artificial Intelligence empowered bots will have complete information about your account and will be able to disseminate the same while talking to you in the same style
and manner as humans do. This can prove to be a very efficient and cost-effective tool for customer servicing and collecting dues.
As the technology for the use of voice bots is developed, the flow can be explained in a simplified manner, as shown in Figure 5.2. However, it is an over-simplified illustration of how unstructured information from the sentence spoken by the person (voice) is converted into a structured sentence or information (voice), by picking up the right keywords.
Some of the major players in voice technology are.
- Google Contact Center AI
- Amazon Connect
- Nuance
- Talk Desk
- Haptik.ai
- Vernacular.ai
- Saarthi.ai
We need to understand and know the important factors to consider
while creating any multilingual voice or conversational application (software system) using Artificial Intelligence or AI.
Understanding the language
Systems must convert speech or unstructured text into structured
information during this process. In this step, a language comprehension engine extracts various signals, such as the statement’s goal, named
entities, emotions and so on.
Players in the conversational AI sector use translation services to
translate non-English inputs into English. A multilingual discourse
cannot be delivered effectively using machine translation. Since
the source and target languages differ lexically, syntactically, and
semantically, information is lost, and inputs are misrepresented.
Without the aid of translation, India-based startup Saarthi.ai
developed a Native NLP stack over vast amounts of non-English data
to understand non-English inputs in more than 20 languages. They
claim that they have the advantage of realizing the full potential of
Conversational AI in non-English languages.
Processing the speech
Given the large information bandwidth of the channel, voice-first conversational AI applications should be used. For voice assistants on digital platforms and IVR, precise speech processing is essential. The challenge is to extract important semantic information from user
voice input. Low-resource non-English languages are once again a problem. Due to the variety of dialects and accents used in non-English languages, accurate transcription of speech in such languages is very difficult. My interaction with Indian startup Saarthi.ai suggests that it has done away with the need for transcription by directly inferring
semantic signs from the speech signals in the target language, such as domain, semantic roles, dialogue acts, contexts and intents. On a global benchmark dataset for speech commands, their research team has achieved above 96% accuracy in determining intent, action and
location from noisy audio signals.
Naturality of language to the conversation or voice or dialogue
Modeling intents and slots for virtual assistants has become the
development standards across many industries today. The main
problems with this strategy are that the Bots are taught to determine the optimal course of action based on the user’s most recent sentence. Because the assistant is unaware of what came before the current dialogue stage, this causes tunnel vision. Secondly, when more than
one action is involved in a sentence, system performance is severely constrained, most often referred to as multi-intent comprehension. To execute dialogue in real business use cases at levels comparable to human dialogue performance, a conversational AI assistant must
parse multiple semantic phrases in a sentence. It must recognize
roles, domains, business context, dialogue acts, belief states and
emotions, among other signals. Contact centers frequently have
reams of information about human-to-human conversations via phone recordings and live chat. The technology interface has to
handle all these issues for a better result for the user.
Automation of the process flow and language
Before a problem is solved, most questions and requests go through several steps and perhaps, even several discussions. Therefore, rather than ignoring the domain ontology, Conversational AI systems must use it to be helpful. Some of the newer technology firms, like Saarthi. ai, have been developing enterprise and domain-specific ontologies
for telecom, lending, collection and e-commerce, while maintaining a highly replicable pipeline for other domains. These tech companies can automate close to 70% of contact center volume without manual assistance because of the advantages of conversation based data modeling, continuously developing data ontologies and dialogue
policy based conversation management.
Case study of Saarthi.ai
In this section, to make it more like a case study, I had a detailed
interaction with the promoters of Saarthi.ai. Mr. Sangram Sabat, Co-
founder and COO, gave a detailed overview of the entire process,
workflow, and the industry which has made this section more like a
story or a journey which incidentally also matches with my thoughts on how the Fintech industry is going to use Voice and Vernacular going forward.
Bharat — The unique land
Unity in Diversity has always been India’s identity. With a 1.4
billion population, India is almost a continent of various cultures,
languages, socio-economic groups, and preferences in food, clothing,
entertainment and more.
Culture runs deep in our veins. The only successful initiatives in India are the ones that cater to and have an intricate understanding of the socio cultural trends. India is undoubtedly one of the largest and fastest consumer economies in the world, but a large portion of the country remains underserved even with the advent of disruptive technologies.
The great digital and geographic divide
There is a significant digital divide between the Tier 1 cities and
the lesser developed suburban and rural areas, which columnists
generally refer to as the divide between ‘India’ and ‘Bharat.’
The latter either do not have access or are unskilled in using technology, putting the country at a competitive and economic disadvantage. The demography, especially the young populace, has low digital literacy, is vastly unskilled in Information Technology and is unable to
take advantage of the vast amount of information online.
While “Artificial Intelligence” “Internet of Things (IoT)” and
“Blockchain” are the talk of the town all around, “loan waivers,”
“subsidies” and so on remain the buzzwords in the 6,00,000 villages
and 7,935 towns of India. India is rising, but more than half the
population does not have access to the Internet, and many more are
not on the information highway that the rest of us are on. But, that is changing fast.
In India, access to public entitlement is hard; the poor and illiterate
get misguided quickly. Access to basic human necessities like
pension, daily wage, food, healthcare, and education is challenging.
Around 2012, the Ministry of Rural Development (MoRD) and the
Unique Identification Authority of India (UIDAI) signed an MoU to
integrate the MGNREGA process with Aadhaar. It was expected
that Aadhaar would soon facilitate a range of MGNREGA, banking,
insurance, and other services for rural citizens. However, it was
reported that compensation paid on time drastically declined from
50.1 percent in 2013–14 to 26.90 percent in 2014–15. In 2008, when
the central government had directed that all MGNREGA wages be paid through banks and post offices, the banks and post offices were unable to cope with the volume of payments. However, this and a host of other schemes are now improving thanks to the internet penetration, the government’s efforts, and the JAM trinity JAM is used as a short form for Jandhan Yojna, Aadhar, and Mobile Number combination. Demand is not a problem in India, but supply and access are significant concerns. If information dissemination through technology is not intuitive and infrastructure doesn’t evolve to accommodate the exponential surge of queries and requests, everything will fall apart. If we become a global superpower, our generation will tell the story of the great “Bharat”.
The rural uprising
The country’s condition is much better than many parts of the world like Sub-Saharan Africa, where people have to live with expensive data plans and smartphone prices, limiting both the coverage and usage of the Internet. In Sub-Saharan Africa, the cost of entry to low-level devices represents 375% of monthly income.
In India, cheaper smartphones and data plans have increased internet penetration. Although that’s largely been an urban phenomenon until now, the upcoming digital revolution will be a story of “the rise of the rural consumer.”
As per Nielsen’s Bharat 2.0 report, the number of active Internet users aged 12 years and above are 592 million. Compared with 2019, the active internet user base for 12 years and above has shown an impressive growth of about 37 percent. The rural users’ growth at 45 percent continues to outshine urban users’ growth of 28 percent over 2019 and own growth as well, which was 35 percent in 2019.
The other trend is related to online banking and digital payments.
A study revealed that the users of Online Banking and Digital
Payments had been identified as urban, affluent users from NCCS
A, with 2/3rd of the users belonging to the age bracket of 20 to 39 years. While the usage is higher in urban areas, 46 percent of its users are from rural India.
Voice for Bharat
With the various means to make technology available everywhere
in this fast-improving world, the focus is shifting towards making
technology humane and inclusive.
In this context, literacy, digital savviness, and infrastructure challenges are major barriers. Voice-based AI-assisted interactions with technology
and services will level the playing field by removing the barriers to
digital literacy and awareness. Voice-based payments and assisted banking is a rage amongst banks currently, although it is still in the nascent stages. But, this is just the beginning of what can be achieved. Anyone who can talk can benefit from Voice AI. An average person can speak five times as many words in one minute as they can type. It is the most natural and efficient mode of communication. Text and touch were the initial modes due to the lack of technology. Although it continues to be an extremely difficult topic of active research, availability of computing power, data and focused research in the current times signal that the problem will be mostly solved for mainstream use, in various domains like healthcare and finance,
within this decade. Besides that, the human voice carries orders of magnitude more signals than text. Voice conveys speech and information on gender, demography, emotional state and much more. Such signals may
be helpful for a variety of studies and use cases, from biometric
verification to help to communicate with consumers in a situation-based, tailored way.
Voice not only conveys speech, but also provides information on the
speaker’s gender, identity and emotional state. Voice AI can also work on telephony, carrying services to rural areas or rough terrains with inadequate infrastructure or poor networks. The icing on the cake is perhaps the augmentation in customer experience. Voice AI can make any interaction engaging, contactless, fast and independent of
rigid UI paths.
The language of technology
Water changes at every mile and language at every four miles in India.
The Internet mainly consists of websites built-in English and a minority of them in either Japanese, German, French, Spanish, Portuguese or Chinese. Many parts of India have very few people that can understand English. More than 90% of internet users are non-English speakers. Many are neither as skilled in leveraging the internet nor as adept in interacting with a predominantly English
web.
This is a global problem as less than ten percent of people are English literate while the rest, more than three billion, speak languages that are sparsely represented on the web. Hindi is the mother tongue of close to 44% of Indians as per Census 2011, the rest of India speaks almost 120 other languages. As per the census of 2018, no fewer than 19,500 dialects are spoken in India. If the statistics are mind-boggling, imagine how many users are marginalized because of the
unavailability of technology in their language. Even if we do not
consider 120 languages, we know that there are at least 15 important and well spoken languages like Telugu, Tamil, Punjabi, Kannada,
Malayalam, Marathi, Gujarati, Bangla and so on. Enterprises attempting to enter or strengthen their hold of the Indian market should be aware of the country’s digital user categories and
their language preferences. To the creators of software and technology, there isn’t an iota of doubt now that Natural Language Processing
and Speech Processing are more relevant and significant than ever in influencing the lives of 20% of the global population. From India’s perspective, there are publications in local and regional
languages, but there is hardly any process, system or technology to
understand the spoken local, or regional language.
We all have observed an accent difference when a person from South
India communicates in Hindi or a person from North India speaks in English. In many cases, even when the person is speaking correctly,
we find it difficult to understand the words, meaning or intentions.
Similarly, the tone of voice also changes how communication is done.
The mission for vernacular and Voice technology Company — Saarth.ai
According to Curtis Kularski — “the digital divide is composed of
a skill gap and a gap of physical access to Information Technology (IT) and the two gaps often contribute to each other in circular causation. Without access to technology, it is difficult to develop technical skill and it is obsolete to have access to technology without first having the skill to utilize it.
Information and Communication Technologies are an
irreplaceable tool in society. The diffusion of ICTs in the financial,
educational and healthcare sectors have been transformational. In the absence of knowledge about using ICTs, the potential to generate a
socio-economic impact from the young population will go in vain.
Saarthi.ai is driven to become the primary medium of interaction
for millions of people in India by implementing a Conversational AI platform to make the web more intuitive and reach regions where
the internet is scant through telecommunication via multilingual AI bots. They are making the power of the internet accessible to everyone in their native language. Traditional interactions with systems like apps, websites, social channels, and Interactive Voice Response systems are non-interactive, rigid and mainly in English. Conversational AI transforms these interactions into simple
conversations. Imagine if a farmer could call a number and talk on
the phone with a virtual agent to learn more about their agricultural credit scheme. Wouldn’t it be a boon for the users who face digital exclusion? The company focuses on developing the language diverse geographies of Asia Pacific, Europe, the Middle East and
Africa, starting with India.
This is a matter of human rights when people cannot access pivotal technologies, leading to a divide between the “savvy” and the “non-savvy” and the threat of leaving behind billions of people on the fringes of complete digital exclusion.
In the past, Initiatives have relied on employing humans in contact centers to interact with the language-diverse demography. However, catering to such a vast user base requires the intervention of AI, as a human only solution puts tremendous pressure on the infrastructure. It also creates many low-value jobs that become obsolete as automation gradually takes over, thereby vastly underutilizing the true potential of human capital. India, the second most populous country, has the majority of its increasing population in the working
age group, giving it a strategic economic advantage. A more working population implies more development. But this can be possible only if the young population is adequately equipped to adapt to an exponentially evolving world.
The Current State of Technology
In my view, the ‘Voice Bots’ will be the future of conversational
engagement between a fan or follower and a celebrity or a brand.
The famous sports players, movie actors, politicians and so on may do a two-way interaction through voice bots where the follower or fan won’t even realize that they are talking to a bot. Currently, these people interact with their fans and followers through social media platforms in text or typed communication managed by agencies.
Market leaders are waking up to build NLPs focused on Indian
languages, but cultural complexities and techno human constraints, are a massive barrier for Indic language computing.
- Scale and diversity — India has grouped variations of languages written in 13 different scripts, and officially recognizes 22
major languages, including a plethora of dialects. Thus, there is a need to develop approaches that can be generalized,
and scaling to multiple dialects should be only a task of
adaptation. To begin, voice communication must be built in English, Hindi and most spoken regional languages. - Code mixing — This uses more than one language in the same
utterance speech or text. Handling code, switching from
one language to another in an automatic speech-to-text and
understanding the language simultaneously is very difficult. - Resource scarcity — One of the most crucial revelations to Indic-language computing recently was the scarcity of data which makes any movement in this sector impossible. Language computing uses sophisticated machine learning techniques for large amounts of high-quality data. We take the example from automatic
- Machine Translation — the Hansard corpus for English French contains 1.6 billion words and
even WMT 15 data for English-Czeck has about 16 million parallel sentences in 2019. At that time, one of the only meaningful examples for an Indian Corpora was the CFILT-IITB En-Hi corpus, which has 800,000 similar sentences.
The situation is worse for other languages. For example, the available corpuses for Sinhala Tamil do not reach even 50,000
sentences. - Lack of staged development of speech and Natural
Language Processing tools — The NLP pipeline comprises several stages, from processing words known as tokens to classifying sentences to discourse computation. As
per the name, the pipeline has many downstream tasks that
are affected by the accuracies of earlier stages. Globally, since
the language for business was English, all work on linguistic
computational processing was done in the same language.
This led to a uniform and progressive development of NLP tools. In contrast, other languages do not even have basic morphology analyzers that split words into their roots and suffixes. Even if there are any tools or algorithms, most of them are inaccurate. - Absence of linguistics knowledge — While it may appear on the surface that Speech processing and NLP are only driven by humongous amounts of data and faster computation speeds, it will surprise many to know that many teams have at least one linguist who has a deep understanding of language
phenomena. This helps solve the problems of saturation
in accuracy. It also helps design good strategies and make results more explainable. Such a linguistic tradition is absent
in many languages. - Script complexity and non-standard input mechanisms — In an Indic language such as Devanagari, there are 13 vowels, 33 consonants, complex conjunct characters, 12 vowel marks
(matras) and special symbols (chandra bindu, anusukta and
so on. Script complexity makes input speeds 2–3x slower than English. The presence of 13 different scripts aggravates
this problem. To counter this problem, people work around
Roman inputs through transliteration. - Non-standard transliteration — Although transliteration is
widely used to input words across devices, it is far from being standard. For example, the Hindi transliteration for “mango” (a fruit) can be “am,” “Am,” or “aam.” This creates even more complexity for the computational processing of language.
Challenging language phenomena
Another language phenomenon common to major Indian languages is compound verbs. Compound verbs are composed of two verbs such that the main information content of actual action is carried by the first verb the polar and the information for Gender/Number/Tense/Aspect/Modality is marked on the second one the vector.
- Hindi sentence — Bol uthaa (Hindi string)
- General translation — Speak rose (gloss)
- Accurate English — Spoke up (English translation)
Here, the vector verb, the second word, carries a ‘feeling’ on top of the main action of speech. Catching such fine nuance is essential, for example, in emotion analysis.
Many Indian languages also show heavy stacking of morphemes. - Marathi sentence — gharaasamorchyaanii malaa saaMgitle
- Morpheme breakdown — ghar+aa+samor+chyaa+nii+mala a+saMgit+le
- General translation — house+<morpheme: oblique
marker>+front+of+<ergataive - marker — agent> me told (gloss).
- Accurate English — The one in front of the house told me translation.
This needs sophisticated word segmenter and morphology analyzers. We can rest the case by concluding with one last nuance of language “polysemy” wherein the same word can vary in meaning based on the context. For example, “I have to study for the bar” vs. “I want a chocolate bar” Besides, multiple other problems make the development of pervasive technology extremely hard.
- Background Noise
- Inflections of words in speech
- Domain-specific words uttered in a non-English language
- Punctuation placements
- References to the past or a central subject in a conversation
Conversational AI today is designed for English and relies on
Machine Translation to deliver multilingual conversations.
Translation leads to misrepresentation and loss of information in spoken language. It also corrupts the data which is a huge blunder for any Deep Learning based system. Language understanding Engines in the market capture only 2-3 cognitive signals from even English messages. This makes them inept in crafting human-like
conversations as they do not understand many nuances of language understanding. Thus, the current approaches are unreliable and limited in comprehending unstructured non-English conversations. It inhibits the adoption of technology and the inclusion of the people for whom
it would benefit.
Bridging the Gap
To understand the uniqueness and reliability of Saarthi’s voice-first
Multilingual Conversational AI platform, one must first picture
how an AI Assistant a Dialogue System functions. The figures
shown above fgures are oversimplified examples of
how unstructured information from your sentence is converted to structured data. If no voice inputs are given, one can simply remove
the Automatic Speech Recognition used for the Speech Processing part of these figures.
The structured information is then sent to a Dialogue Management
layer which predicts the best possible action. Corresponding to the action, a response is retrieved from a set of responses. That response is then synthesized into speech, if necessary. Therefore, the following are paramount to any voice first multilingual
Conversational AI application.
- Language understanding
In this process, systems must transform unstructured text
or speech into structured information. This step makes use of a language understanding engine that extracts different signals like the purpose of the statement, named entities, emotions and so on. When it comes to processing multilingual “spoken language” inputs, which is the prevalent
form of communication for all service-related interactions,
incumbent systems use translation services to convert non-English inputs to English. It incurs a loss of information and misrepresentation of inputs due to lexical, syntactic and
semantic differences in the source and target language.
Saarthi.ai built a Native NLP stack over huge corpuses of
non-English data to natively understand non-English inputs
in 25 languages without the use of translation. Such Lossless
Multilingual Understanding gives them the edge to unlock the true potential of Conversational AI in non English languages. - Speech Processing
Precise speech processing is paramount to Voice Assistants on digital mediums and IVR. The problem involves identifying key semantic signals from the voice input of users.
A significant issue again comes with non-English languages
that are a low resource. Accurate speech transcriptions in non-English languages, and their different dialects and accents are impossible. A very simple example is that if you say, “Mera Order kahan hai” that might get converted to text as “Mera Border Kahan hai” thereby corrupting the data sent downstream to the text-based natural language understanding system. A speech based language understanding layer is not only free from such mix-ups. It opens up a universe of possibilities, as your speech may contain more than 100 cognitive pieces of information. Saarthi has eliminated the dependency on transcription and infers domain, semantic roles, contexts, intents and various other semantic signals directly from the speech in the native language. Thus, the system is a hybrid system where both speech and text inputs would be leveraged to augment solution reliability. - Enabling Natural Conversations
Today, the development standard across the industry is
to model intents and slots for virtual assistants. The most
significant issues with this approach are.
- Bots are trained to infer their response from only the
current sentence of the user. This leads to tunnel vision
as the assistant has no idea what happened before the
current state of the conversation. - System performance is limited when multiple actions
are present in the sentence, commonly known as multi-
Intent understanding. - System neglects that the same sentence can be used in
different contexts. - System failure often occurs with minor digressions. This is because the assistant doesn’t know what path the conversation should take.
To deliver near-human dialogue performance in real business use
cases. An accurate Conversational AI Assistant needs to know how
to parse multiple semantic phrases in a sentence. It needs to identify
roles, domains, business contexts, actions requested, emotions,
dialog states and many other signals.
Contact centers often have a lot of human-to-human conversation data from live chat and call transcriptions. Their data annotation system uses a novel annotation scheme that considers semantic phrases in a sentence and provides flexibility to assign multiple labels on and within the phrase. Phrases in a sentence are linked to other parts of the conversation as well through relations to various goals that might be a part of the conversation. This coarse grained
data is then used to train an all-in-one model for richer language understanding and natural conversations. The learning achieved over conversations helps the assistant get back on track and traverse the best possible path to complete the
goals identified from the user’s conversation, even with multiple
digressions and carries forward context to understand references.
- Utility of automation
Due to unscalable data modeling techniques and a focus on understanding conversations, many industry applications were earlier unsuitable for conversational AI-based automation. This is partly why we do not see assistants in environments where multiple tasks are carried out over long conversations. The other primary reason is that most applications try to take up generic processes across domains, such as meeting scheduling. This lets systems avoid the
complexities of domain ontology.
However, most queries and requests involve numerous exchanges and sometimes even multiple sessions of conversations before a resolution is made. So, to be truly useful, Conversational AI systems must use the domain
ontology rather than avoiding it. Saarthi has constantly been
building up a domain, and even enterprise-specific ontology
for telecom, healthcare, e-commerce and BFSI and has a highly replicable pipeline for other domains. - Cross lingual embeddings
With the commencement of the Information age and the
ubiquitous availability of media at the fingertips, the very
concept of language, culture, and communication has
undergone a huge change. Code-mixing and switching are
the same results, and Hinglish is a prime example. Such
language transformations arise out of language contact that’s very common in diverse language geographies. To
handle this, Saarthi built models that can share and decode the vocabularies of multiple languages at once. They help compare the meaning of words across languages for cross lingual information retrieval. They also enable Saarthi to share learnings between resource rich and resource-low languages
as the learning and representations are common for multiple languages. - Transfer learning
Building any task oriented virtual assistant requires a lot
of human-labeled data, making technology development
slower and more cumbersome. To work around this, besides
changing the data modeling technique, Saarthi.ai transfers
learning between languages and domains. Their models
are pre-trained and start working with some amount of fine-tuning. This helps Saarthi’s systems work with 10–15% of
labeled data required by other comparable systems, making
them more agile. This also helps adapt technology within
language groups and dialects once the acceptable accuracies
in the parent language have been achieved.
Pervasive innovation is not a result of a single technological
improvement but multiple nuanced and novel improvements
in various technologies working in unison.
They have devised ways to thread conversations across
channels and devices. Deep Learning is notorious for the huge size of its models, thereby making it difficult for enterprises
to use due to cost and responsiveness issues. Through
knowledge distillation, Saarthi.ai compresses model sizes up to 7 times. This helps them traverse the lab-to-enterprise journey quicker, save enterprise’s expenses and enable a better user experience. While working in the financial sector, especially in lending and collections, they have worked to
build a system that can predict precise offers or recovery
strategies that work for different personas.
The combined advantages of various innovations and
data accumulated allow them to automate more than 70%
of customer contact center conversations with customers
without any manual intervention. It will enable enterprises
to achieve significant operational efficiencies and cost reductions and helps them implement the technology at a much broader scale.
When you can interact with a user in their language based
on a myriad of conversational signals and history, imagine
the powerful and inclusive experiences you can create! We
are still far from a future where technology is more humane.
Still, solid foundations are being built for various domains to
communicate with users in every nook and corner of India.
Besides all these innovations, they are also forming alliances
with various academic and government bodies, like FICCI
ILIA Indian Language Internet Alliance. Government policies on data sharing for supporting Indic Language study, computing grants/subsidies for companies to experiment and so on. have all been implemented to date. They have also contributed to the cause with their benchmark datasets in
many languages. There is also an imminent need to re-skilling the workforce as traditional roles vanish. The rapid emergence of new technologies generates the need for new skills, creating a
huge skill gap. About 40 percent of India’s workforce must be re-skilled over the next five years to cope with emerging trends such as AI, IoT, machine learning and blockchain. There’s huge despair over job loss due to automation as firms across industries are laying off almost 50% of their workforce. Most of these are knowledge workers and contact center
employees. According to a new World Economic Forum report titled ‘The Future of Jobs 2018’, the Fourth Industrial Revolution will make 75 million jobs obsolete by
2022 but will also create 133 million new jobs , a net gain of 58 million. Saarthi will be contributing to the re-skilling
initiative by.
- Partnering with existing re-skilling ventures to offer
contracts to people on data related tasks and helping
subsidize their education necessary for re-skilling. - Making their AI platform more layman-friendly to aid
knowledge workers and contact center employees in building robust solutions for the rising market demand.
This two pronged approach will boost existing nascent roles and
generate new employment opportunities for the massive working population of India.
A strong and sustainable Bharat
Voice-first Conversational AI will prove the impetus to digitalize
India. Significant leaps in deciphering natural language, enabled by deep learning and AI, will pave the way for the localization of content and services to make the internet a pleasant experience for
Indian language users.
Language and other barriers to ICTs impact users understanding of privacy policies and their ability to protect their identities and sensitive information. It is scary to think how this will lead to disparities in information access for diverse language speakers in critical situations prevalent in Healthcare. The current localization situation on the internet reflects how colonialism and cultural
imposition have shaped the language landscape worldwide. These differences should not be a barrier to using tools and services that protect people and make their lives easier. Localizing technology can even play a role in ensuring that native languages survive.
Video
Video will be the second way to disseminate information and
communicate for Fintech players. The time has gone when we used
to send long notes or detailed paragraphs or emails to explain the products or services or even information about your accounts. That trend has already started to provide information through small videos. Please remember these videos are of concise duration, generally between 30 seconds to three minutes. These self explanatory videos are good enough to explain each subject, product, service, policy or process. People can capture the entire information within a short
span of time because these videos are created by using multimedia
technology.
We have discussed and explained the Voice and Vernacular in detail
above. Videos are a further crucial sector that will shape fintech’s
future. A lot of communication, especially training and explanatory communication will happen through videos. Video communication, especially if it is two-way video, helps both sides not only communicate the message but also understand the body language of the other person. During the pandemic (Covid-19), we all have experienced the increased use of video meetings through Google Meet, Zoom or other platforms. While no other specific technology is required for video communication, it is a mix of text and voice communication and can also be done in local and regional languages. Thus, addressing the Vernacular issue as well. In the financial market,
we have already seen video communication done to meet regulatory compliances as well, for example, video KYC done through various agencies in India empaneled with regulators like the Reserve Bank of India for SEBI. Let us understand the use case of Videos in the Financial Services or Fintech industry.
Recently, RBI approved using Video KYC as a method for remote customer verification in the banking and financial services industries supported by Fintech entities. The financial services industry, particularly the NBFCs, PPIs, and smaller fintech startups operating on shoestring budgets, found it challenging to grow their reach after the Supreme Court prohibited Aadhaar enabled e-KYC from being used to authenticate the identification of consumers. The non-banking companies that sought to serve the unbanked
people in rural India were likewise in transition, with physical KYC
serving as the only tool for proving a customer’s identification. This recent action by the RBI will provide a tremendous boost to the neo-banks or Fintechs, which rely on digital channels for client service. A seamless, paperless, presence-free, and cost-effective KYC solution will be created by combining the V-CIP (Video-based Customer Identification Process) with the offline KYC mechanism based on Aadhar (UIDAI). This will benefit the industry participants and customers who have previously suffered from lengthy, ineffective, inefficient and burdensome KYC processes. By accessing their potential consumers, the financial services sector, which includes lenders and payment providers targeting rural unbanked groups, is also anticipated to experience higher market penetration. The tech
loving millennials and Gen-Xers who have grown accustomed to
having the world at their fingertips may find this option particularly
appealing. A mobile-first approach is picking up across the Fintech
segment to meet regulatory compliances. An alternative to physical and digital KYC is ‘Video KYC’ or paperless KYC, which can be carried out with the customer’s permission. However, it is exclusively the responsibility of the Regulated Entities (REs - which are registered and regulated by institutions like RBI
and SEBI) to ensure the procedure’s integrity. As instructed by the RBI, activity records with the credentials of the official or business correspondent doing the video-KYC must also be kept. Additionally, only after a contemporaneous audit of the account can the KYC be declared complete. The NBFC or Bank not any other service provider, must execute the audio visual interaction for liveness detection.
While the video KYC process is being widely used now, there are
a few challenges that must be taken care of while building the tech platform for your Fintech and must be taken care of. Some of them are.
- Preventing location spoofing: The Video KYC recordings
must be precisely geotagged to confirm that the person
whose KYC is being completed is in India and at that specific
location.
• End-to-end encryption — The video recording must be fully encrypted from beginning to end and securely saved on the cloud server of the lender. The recording date and time of the video recording should also be included to make it simple to retrieve for thorough auditing when needed. - High-quality picture — To facilitate information parsing and verification, OCR and image processing algorithms should make sure that the photographs taken by RBI registered entities are of high quality to avoid any dispute
subsequently. - Facial recognition — The face matching algorithms, based on
AI/ML, should make sure that the person whose video is
recorded for the KYC is the same as the person whose details are provided, resulting in a system that is impregnable against fake identities. While the face match may not happen 100% with the government provided identity documents like PAN or Aadhar, a 70% or above match is sufficient to establish the identity match and facial recognition. - Liveness detection — Random activities during video
recording, such as head, eye, and lip movements and
interactions, will ensure that no previously recorded movies
are used throughout the V-CIP processes. This could be a 5
to 10 seconds activity where the person is asked to read out a
text on the screen.
While many Fintech entities provide the preceding facility, a growing lending institution must consider building it in-house.
Video as a Tool for Branding and Marketing
Top fintech firms have long incorporated animated videos into their marketing plans. But in recent years, that pattern has become more prevalent and many more businesses are doing the same. It is used to showcase the products, and specific product features, to introduce the new website, and the latest mobile application. These videos help to achieve multiple objectives.
There’s a significant probability that your marketing approach is
comparable to your rivals. You are producing the same kind of content and engaging in the same activity however, you will get noticed in the crowd if you build a good video content using vernacular languages as well. You must differentiate yourself from the competition to provide your brand growth, trust and sustainability. Organizations
can stand out from the competition thanks to the creative industries, which include animation. It allows you to communicate engaging
stories, create a unique visual identity and distinguish your business from competitors. If your brand stands out, it can grow more quickly. The correct marketing plan is used to support it.
Videos or animations draw attention almost immediately. This is the
rationale behind the widespread use of fintech explainer videos in
online advertisements by major financial institutions today. Due to the little time, they have to grab the audience before getting skipped. A fundamental advantage of animated video is its capacity for attention grabbing, which makes this type of material particularly effective with the younger audience that is constantly on the go and has a short attention span. While earlier, the videos used to be 2-5 minutes or longer, the attention time has been reduced to less than
one minute. Delivering a pertinent message comes next, which can be done very well with a short, animated video that satisfies this requirement. Fintech explainer videos have been used for a long time as a powerful storytelling tool. They may deliver stories in a more engaging and immersive way with beautiful pictures. The audience then hears your message and responds to it consciously or unconsciously. The cornerstone for creating a brand identity is more significant engagement.
This benefit of video marketing is to leave a long lasting impact
as straightforward and obvious as they come. Due to their superb
visuals and storytelling components, fintech explainer videos have a better retention value. This streamlines the task of branding and marketing professionals.
You may use this to build strong brand identification and therefore, the trust value if people remember your fintech video, message,
and name for longer. In actuality, animated Fintech explainer films
are excellent. A vibrant marketing film can be used to give viewers
more information about your product, including an explanation of its characteristics, advantages, and recommended uses. All of this strengthens your value proposition, opening up the potential for brand creation.
Tips For Effective Fintech Videos For Branding
Your unique fintech films should be tantalizing on an emotional level. One method to achieve this is to tell an engaging story that people can relate to, and identify with, pertinent to their life. Videos that tell
emotional stories will leave a more profound and long-lasting effect.
Additionally, it will have a more significant impact on getting the audience to do the necessary action that a Fintech expects. Make sure to speak with the scriptwriter or author in-depth if you are dealing
with a provider of video services. If you are producing animated fintech videos internally, hire a talented copywriter. Explain your fintech brand’s basic principles, the firm’s purpose and the problems it solves in the stories. You need an animated brand logo that works with and enhances your financial video marketing initiatives, just like some well-known companies like Paytm, Cred, VISA and so on. This animation logo can be used in the economic video’s start, outro, or even in the middle. Your financial brand name’s identity and recall value can be strengthened by it. It is neither difficult nor prohibitively expensive to convert your current logo to animation. Animated video that works isn’t only about pretty pictures. Its appeal and impact can be increased with the correct audio, which you can also employ for branding purposes. One video can use multiple audio voices to use the local region impact, which will create the most important vernacular impact. Competitive analysis is one of the essential components of a fintech video marketing strategy. You may create your KPIs, establish
benchmarks and spot consumer trends with its assistance.
Analyzing your rivals also reveals a wealth of suggestions for the
type of animated fintech movies you should create, the length of
your fintech videos, the ideal distribution method and more. Decide who your primary and closest rivals are. After that, examine their tactics and results to glean concepts and sources of inspiration for your branding approach.
All the videos being created for branding and marketing must be created or dubbed in the local, and regional languages; the Vernacular
is very important, especially in countries like India. To summarize, voice, video and vernacular will drive the Fintech growth in India and Bharat.
Vernacular
The third and last V is vernacular. While this may be applicable and
true for the global Fintech industry, it is essential and relevant for India. India is a vast country with multiple regional languages. Any communication, whether written or verbal, is more effective in the local or regional language with the people or the customers. Any communication done in a local or regional language will have more
impact and will be result oriented because it connects the customer
with you directly and also creates an emotional bonding.
While there are 22 official languages in India, the bigger states or geographical areas speak Hindi, Bengali, Kannada, Malayalam,
Marathi, Punjabi, Tamil, Telugu, and Oriya. These would cover most people hence all the Voice and Video must look to use these languages for communication though it may also depend on the geographical area being serviced by a Fintech. The language (Vernacular) also has a social and emotional connection with the service provider. It creates
belongingness or comfort between the customer or user and the
Fintech, who communicates with the customer for lending services, collection, other products or services. From a compliance point of view, it is mandatory by RBI to explain the loan agreement, sanction terms and all other important aspects to the customers in the language the borrower understands. The
lenders need to take a ‘Vernacular Declaration’ that the borrower has explained everything well in the language they understand.
Conclusion
In this section, we have learned about the future of Fintech, focusing on the lending industry. It will be essential to use the technology in Voice, Video, and Vernacular space and how it will improve the efficiency of the Fintechs not only from a sales and marketing perspective but also for better compliance. Considering India’s size and diversity, this technology will play a major role in the growth of any Fintech in the future. We shall also witness new startups in this segment which will offer customized solutions to the Fintech industry and other industries. As we have learned about building a solid and secured tech platform for a Fintech in lending space and the importance of three Vs. In the next section, we shall discuss the final point of this section, the ‘Investment pitch.’ We will also cover the essential aspects to be taken care of while you pitch to various kinds of prospective investors.