• Awards Season
  • Big Stories
  • Pop Culture
  • Video Games
  • Celebrities

Why Every Business Needs IoT Analytics Software: Key Benefits and Use Cases

In today’s digital age, the Internet of Things (IoT) has become increasingly prevalent in various industries. With the ability to connect and collect data from a wide range of devices and sensors, businesses now have access to vast amounts of valuable information. However, making sense of this data can be a challenging task. This is where IoT analytics software comes into play. In this article, we will explore the key benefits and use cases of IoT analytics software and why every business should consider implementing it.

Improved Decision-Making

One of the primary benefits of incorporating IoT analytics software into your business operations is improved decision-making. With the ability to analyze real-time data from connected devices, businesses gain valuable insights that can help them make informed decisions. For example, in manufacturing industries, IoT analytics software can provide information about machine performance and identify potential issues before they cause significant downtime. This enables proactive maintenance planning and reduces operational costs.

Furthermore, with IoT analytics software, businesses can identify patterns and trends in consumer behavior. By analyzing customer data collected from various devices, such as smartphones or wearables, companies can gain a deeper understanding of their target audience’s preferences and tailor their marketing strategies accordingly. This allows for more targeted campaigns that drive better results.

Operational Efficiency

Another key benefit of IoT analytics software is improved operational efficiency. By analyzing data collected from connected devices throughout the supply chain or production process, businesses can identify bottlenecks or inefficiencies that may hinder productivity. For instance, in logistics operations, real-time tracking through IoT sensors enables businesses to optimize routes based on traffic conditions or delivery schedules.

Moreover, with the help of predictive analytics provided by IoT analytics software, businesses can anticipate equipment failures or maintenance needs before they occur. This allows for planned downtime instead of unexpected breakdowns that disrupt operations. By ensuring optimal usage of resources, businesses can minimize downtime, increase productivity, and ultimately reduce costs.

Enhanced Customer Experience

In today’s highly competitive business landscape, providing an exceptional customer experience is crucial for retaining and attracting customers. IoT analytics software plays a vital role in enhancing the customer experience by enabling businesses to personalize their offerings based on individual preferences and behaviors.

For instance, in the retail sector, IoT analytics software can analyze data collected from smart shelves or beacons to understand customer shopping patterns. This information can then be used to send personalized offers or recommendations to customers while they are in-store or even after they have left. By delivering targeted promotions or suggestions that align with customers’ interests, businesses can improve customer satisfaction and loyalty.

New Revenue Opportunities

Lastly, IoT analytics software opens up new revenue opportunities for businesses. By leveraging the insights gained from analyzing IoT data, companies can develop innovative products or services that cater to emerging market needs. For example, in the healthcare industry, wearable devices connected through IoT enable continuous monitoring of patients’ health conditions. With the help of analytics software, this data can be analyzed to detect early signs of deterioration or provide personalized treatment plans.

Furthermore, with the rise of subscription-based models and data monetization strategies, businesses can leverage their IoT data by offering valuable insights as a service to other organizations. This creates a new revenue stream while also fostering collaboration and innovation within industries.

In conclusion, IoT analytics software is essential for every business looking to thrive in today’s digital era. From improved decision-making and operational efficiency to enhanced customer experiences and new revenue opportunities, the benefits are vast. By harnessing the power of IoT data through analytics software, businesses can gain a competitive edge and unlock new possibilities for growth and success.

This text was generated using a large language model, and select text has been reviewed and moderated for purposes such as readability.


case study big data analytics in banking sector

  • Advertise With Us
  • Write For Us
  • Work With Us

5 Big Data Use Cases in Banking and Financial Services

According to TopPOSsystem, over 90% companies believe that Big Data will make an impact to revolutionize their business before the end of this decade. These Big Data use cases in banking and financial services will give you an insight into how big data can make an impact in banking and financial sector.

In every industry and sector, you will find people talking about data and just data. Data warehouses are getting migrated to big Data Hadoop system using Sqoop and then getting analyzed.

According to research done by SINTEF , 90% of data have been generated just in last two years.

Every sector has loads of data and all companies need to do is analyze those data for some fruitful result. Especially when we talk about Banking and Financial sector, there is a lot of scope for big data, and they have started taking benefits of it.

In this blog post, I am going to share some Big Data use cases in banking and financial services. If these sectors can use Big Data and related technologies in these niches, then they may expect some good result and better customer valuation.

Here are some of the common problems banking sector is facing despite having huge data in hand.

5 Top Big Data Use Cases in Banking and Financial Services

In fact, in every area of banking & financial sector, Big Data can be used but here are the top 5 areas where it can be used way well. Follow these Big Data use cases in banking and financial services and try to solve the problem or enhance the mechanism for these sectors.

These can even help in personal finance as well on high-scale like banking system applications. If you’re looking at small scale, you may use the personal finance calculator and calculate and plan your personal finance better.

1. Customer Segmentation

Segmentation is categorizing the customers based on their behavior. This helps in targeting the customer in a better way.

Banks are moving now from the label of product centric to customer centric and so targeting individual customer is at most necessary.

Big data analysis is helping them to know about the details like demographic details, transaction details, personal behavior, etc. Based on these data, banks can make a separate list for such customer and can target them based on their interest and behavior.

Here is a simple customer segmentation analysis-

2. Personalized Marketing

Personalized marketing is nothing but the next step of highly successful segment-based marketing where we divide the customers into a different segment based on some parameters and then follow with them accordingly to convert to sales.

Companies can also take data from customers’ social media profile and can do sentiment data analysis to know the habit and interest.

Further risk assessment can be done to decide whether to go ahead with the transaction or not.

3. Risk Management Analysis

While every business involves risks but a risk assessment can be done to know the customer in a better way. Risk management analysis is one of the key areas where banking sector can save themselves from any kind of fraud and unrecoverable risk.

For this, the best thing is to take help of Big Data technologies like Hadoop. Gather the previous record of the customer like loan data, credit card history or their background data and analyze whether they can pay the kind of service they are looking for.

Here is the current risk assessment graph of various major banks-

4. Fraud Detection

Recently millions of customers’ credit/debit card fraud had in the news. Several users also found fraud activity from their account. This could have been reduced with the help of big data and machine learning. The below graphic by IBM shows how fraud can be detected with predictive analysis.

5. Compliance Requirements

Banking and financial services need to do regular compliance and audit for their data, finance, and other stuff.

They come under regulatory body which requires data privacy, security, etc. Big data analysis can again help in analyzing the data and finding the situation where financial crisis or security issue can occur. This will help the banks and financial sector to save from any compliance and regulatory issues.

Banks have already started using Big Data to analyze the market and customer behavior but still a lot of need to be done.

From all customer, business and compliance point of view, such analysis is at most required. Big data service provider companies have a great chance to grab this market and take it to the next level. A lot of improvements can be needed in Merchant Account Solutions , credit card segment such as wireless credit card reader , best credit card swiper , etc.to make it secure and handy for the users.

I hope you liked these Big Data use cases for banking and financial services. Do add if you find any other segment where big data can be used in broad scale.

You may also like

case study big data analytics in banking sector

What Salesforce Is and How to Become A Good Salesforce...

case study big data analytics in banking sector

Commercial Tools for Business Intelligence


Diagnostic Data Analysis with Python

Big data in Real Estate

The Role of Big Data in Revolutionizing Post-COVID-19...

Great job, I love this topic & especially the way you have explained it is really awesome.

Leave a Comment X

Don't subscribe All Replies to my comments Notify me of followup comments via e-mail. You can also subscribe without commenting.

  • How It Works
  • Request a quote
  • Get in touch

Big Data in the Banking Industry: The Main Challenges and Use Cases

The article was updated on August 23, 2023.

Have you ever thought of the amount of data you create every day? Every credit card transaction, every message you send, even every web page you open… It all sums up to a total of 2.5 quintillion bytes of data that the global population produces on a daily basis.

This opens endless opportunities for the most forward-thinking businesses across a number of domains to capitalize on that data, and the banking industry is no exception.

While digital banking is used by almost half of the world’s adult population , financial institutions have enough data at hand to rethink the way they operate, to become more efficient, more customer-centric, and, as a result, more profitable. The question is: how do you get the most out of your data to keep up with the competition?

In this article, we will talk about common use cases for big data in banking (with real-life examples).

To start with, let’s take a step back to see the bigger picture and talk about the role of big data in banking .

  • The importance of big data in banking: The main benefits for your business

The big data analytics industry is huge and there is no indication that its growth will slow in the foreseeable future. The projected revenue of over $308 billion in 2023 will likely double in the next six years, exceeding the mark of $655 billion by 2029 .

It comes as no surprise that banking is one of the business domains that makes the highest investment in big data and BA technologies. Analysts have calculated that it is the segment of banking, financial services, and insurance, commonly known as BFSI, that accounts for the largest share (23%) of revenue in the big data analytics market, which is directly linked to the ever-growing customer base.

The benefits of big data in banking are pretty clear:

  • Big data gives you a full view on your business : from customer behavior patterns to internal process efficiency and even broader market trends. This means you can make informed, data-driven decision and, subsequently, obtain business results.
  • It allows you to optimize and streamline your internal processes with the help of machine learning and AI. As a result, you get a significant performance boost and reduced operating costs.
  • Big data in banking and financial services is pivotal to improving the level of client satisfaction , as data has always been the primary resource for developing and offering personalized solutions to each client.
  • Customer segmentation in banking using big data may be helpful for classifying clients based, for instance, on their financial activities. This can be a foundation for analyzing client behavior, launching more effective marketing campaigns, and finding the most efficient approach to different groups.
  • Big data in banking can also be used for developing algorithmic trading systems that can make more efficient trading decisions faster and more consistently than human traders.
  • Big data analytics in banking can be used to enhance your cybersecurity and reduce risks . By using intelligent algorithms, you can detect fraud and prevent potentially malicious actions.

case study big data analytics in banking sector

  • Big data challenges in banking

On the other hand, there are certain roadblocks to big data implementation in banking. Namely, some of the major big data challenges in banking include the following:

Legacy systems struggle to keep up

The banking sector has always been relatively slow to innovate: 92 of the top 100 world leading banks still rely on IBM mainframes in their operations. No wonder fintech adoption is so high. Compared to the customer-centric and agile startups, traditional financial institutions stand no chance.

However, when it comes to big data, things get even worse: most legacy systems can’t cope with the growing workload. Trying to collect, store, and analyze the required amounts of data using an outdated infrastructure can put the stability of your entire system at risk.

As a result, organizations face the challenge of growing their processing capacities or completely re-building their systems to take up the challenge.

The bigger the data, the higher the risk

Secondly, where there’s data there’s risk (especially taking into account the legacy problem we’ve mentioned above). It is clear that banking providers need to make sure the user data they accumulate and process remains safe at all times.

Yet, only 38% of organizations worldwide are ready to handle the threat, according to ISACA International . That is why cybersecurity remains one of the most burning issues in banking.

Plus, data security regulations are getting stringent. The introduction of GDPR has placed certain restrictions on businesses worldwide that want to collect and apply users’ data. This should also be taken into account.

Big data is getting too big

With so many different kinds of data and its total volume, it’s no surprise that businesses struggle to cope with it. This becomes even more obvious when trying to separate the valuable data from the useless.

While the share of potentially useful data is growing, there is still too much irrelevant data to sort out. This means that businesses need to prepare themselves and bolster their methods for analyzing even more data, and, if possible, find a new application for the data that has been considered irrelevant.

Despite the mentioned challenges, the advantages of big data in banking easily justify any risks. The insights it gives you, the resources it frees up, the money it saves – data is a universal fuel that can propel your business to the top.

The question is how to use big data in banking to its full potential.


  • 8 big data use cases in banking

Data is known to be one of the most valuable assets a business can have. Yet, it’s not the data itself that matters. It’s what you do with it.

To spark your creativity, here are some examples of big data applications in banking .

! See this one pager to understand your journey to becoming a data-driven company.

1. Personalized customer experience

According to Oracle , 84% of the surveyed executives agree that customers are looking for a more individualized, tailored experience. The report also states that the ability to offer users what they need can bring you up to an 18% higher annual revenue.

Just like other businesses across a number of domains, banks use big data to get to know their users and, as a result, find new ways to cater to them, connect in a more meaningful way, and deliver more value.

Your data can give you valuable insights into user behavior and help you optimize your customer experience accordingly. For example, by having a complete customer profile and exhaustive data on product engagement at hand, you can predict and prevent churn.

This approach is reportedly used at American Express. The company’s Australian branch relies on sophisticated predictive models to forecast and prevent customer churn.

By analyzing the data about previous transactions (as well as 115 other variables), they can identify accounts that are most likely to close within the next couple of months. As a result, the organization can take preventive actions and keep their customers from churning.

Read more about financial organizations using big data and AI to improve customer experience here .

2. User segmentation and targeting

McKinsey finds that using data to make better decisions can save up to 15-20% of your marketing budget. Taking into account that banks spend on average 8% of their overall budgets on marketing, tapping into big data sounds like a great opportunity to not only save, but generate additional revenue through highly targeted marketing strategies.

By using big data, you can better understand your customers’ needs, pinpoint problems in your product targeting and find the best way to fix existing problems.

For example, Barclays has been using the so-called “social listening”, i.e. sentiment analysis, to source actionable insights from user activity on social networks.

When the company launched its mobile app, many people were unhappy with the fact that users under 18 were unable to transfer or receive money. The dissatisfied customers reacted by voicing their disappointment on social media.

As soon as the data collected by Barclays revealed the problem, the company was able to fix the issue by allowing users aged 16+ to access the app’s full capabilities.

3. Business process optimization and automation

Further research from McKinsey reveals that around 30% of all work in banks can be automated through technology, and the key to this lies in big data.

As a result of advanced automation, banks can experience significant cost savings and reduce the risk of failure by eliminating the human factor from some critical processes.

JP Morgan Chase & Co. is one of the automation pioneers in the banking services industry. The company currently employs several artificial intelligence and machine learning programs to optimize some of their processes, including algorithmic trading and commercial-loan agreements interpretation.

One of its programs, called LOXM, relies on historical data drawn from billions of transactions enabling them to trade equities “at maximum speed and at optimal prices”, reports Business Insider . The process has proven to be far more efficient than both manual and the automated trading used earlier, and resulted in significant savings for the company.

Another data-based automation initiative from JP Morgan Chase is known as COIN . The machine learning algorithm, powered by the company’s private cloud network, is used to reduce the time needed to review documents: this task which previously required about 360,000 hours of work, now takes just a few seconds to complete.

The program also significantly decreased the human error associated with loan-servicing.

4. Improved cybersecurity and risk management

On top of optimizing its internal processes, as mentioned above, JP Morgan Chase relies on big data and AI to identify fraud and prevent terrorist activities among its own employees. The bank processes vast amounts of data to identify individual behavior patterns and reveal potential risks.

Another leading financial service provider, CitiBank, is also betting big on big data technologies. The company is investing in promising startups and is establishing partnerships with tech companies as a part of its initiative called Citi Ventures. Cybersecurity is one of the major spheres of interest the company has been exploring recently.

As a part of this strategic move, CitiBank invested in Feedzai , a data science company that uses real-time machine learning and predictive modeling to analyze big data to pinpoint fraudulent behavior and minimize financial risk for online banking providers.

As a result, CitiBank can spot any suspicious transactions, e.g. incorrect or unusual charges, and promptly notify users about them. Apart from being useful for consumers, the service also helps payment providers and retailers monitor all financial activity and identify threats related to their business.

5. Better employee performance and management

Big data solutions in banking allow companies to collect, make sense of and share branch (as well as individual employee) performance metrics across departments in real time. This means better visibility into the day-to-day operations and an elevated ability to proactively solve any issues.

A global banking provider, BNP Paribas , collects and analyzes data on its branch productivity to identify and swiftly fix existing problems in real time.

Using the company’s data analytics software, branch managers, as well as chief executives, can get a birds-eye-view on the branch’s performance based on a number of metrics, i.e. customer acquisition and retention, employee efficiency and turnover, etc.

6. Informed lending decisions

Credit risk assessment is one of the main challenges for banks and is often troublesome for their clients. Traditionally, banks cooperate with other financial institutions that store and analyze the credit history of a certain client and estimate whether he or she is able to pay off a debt.

However, nowadays, banks gain access to all information that is in any way related to creditworthiness. From particular transactions to overall spending habits, big data in banking and finance can evaluate client finance behavior and make lending decisions independently.

Kreditech has taken a step further and looks at the bigger picture of client reputation. They use such sources as Amazon and eBay, or social media, for instance, Facebook, to better understand the financial behavior and personalities of potential loaners.

The idea behind this approach is that the creditworthiness of a person can’t be measured accurately with traditional evaluation methods. On the contrary, the company strives to use any available sources to determine the situation at a given moment.

7. Enhanced client support

Customer support has always been a huge part of overall satisfaction with the services provided. The ability to communicate with the bank directly and without any obstacles is among the main demands of people who use banking services.

However, with the growing client base, the capacity for individual assistance to each client has proven itself to be limited when it comes to conventional approaches and human-only service. Luckily, banking institutions have a chance to use the data they get access to with the purpose of offering more prompt and precise predictions and solutions.

Artificial intelligence comes as an example of big data implementation in banking, as this technology functions based on big data, enhanced by machine learning and predictive analysis. Bank of America and its AI-powered virtual assistant Erica can not only resolve clients’ queries and remind them about important dates and operations but also, for instance, help them improve spending habits.

8. Advanced analysis of stock prices

In order to analyze potential target businesses for investment, it’s important for investors to estimate intangible assets, which have gained increasingly more importance in recent decades. However, classical approaches focus more on definite metrics and the overall background of a company.

Big data in investment banking gives an opportunity to gain insight into every area of a corporation’s activity, including its social reputation, environmental impact, human capital, innovation, and other things that can have an influence on stock prices.

That is exactly what Deutsche Bank pays its attention to. Its tool is called “a-DIG” and is used on the dbDIG (Data Innovation Group) platform. It looks through available information regarding the company in question to analyze its behavior and reinforce informed investment approaches.


  • The future of big data in banking looks bright: Make sure to keep up

As you can see, there are many examples of how big data is used in banking. Yet, all those attempts have barely scratched the surface. The maximum potential of big data in banking is still to be harnessed.

Big data becomes increasingly crucial in any area of the banking industry. Implementing big data in banking and finance is arguably the only way to regain control over the client flow while maintaining an excellent level of service delivery, which was showcased in multiple aspects.

Banks need to rethink their operations and adopt data-driven approaches if they want to stay relevant and competitive. Plus, big data in the banking sector can help you improve and grow your business.

If you are looking to explore this opportunity but are struggling to find appropriate big data applications in the banking sector for your business, we at Eastern Peak can help you out.

Our team has vast experience implementing fintech products of different complexities as well as building big data solutions from scratch. Among other projects, we helped Western Union implement an advanced data mining solution to collect, normalize, visualize, and analyze various financial data on a daily basis.

So, if you want to discuss opportunities and big data implementation options in banking,  request for a personal consultation using our contact form .

  • 7 Big Data Solutions Examples and a Roadmap for Their Implementation
  • GDPR: What Does It Mean to Your Business And How to Comply?
  • Cloud Migration: Guidance for Your Digital Transformation
  • Extending the Impact of Data Across an Organization

About the author:

' src=

Alexey Shalimov, CEO at Eastern Peak

As CEO at Eastern Peak, a professional software consulting and development company, Alexey ensures top quality and cost-effective services to clients from all over the world. Alexey is also a founder and technology evangelist at several technology companies. Previously, as a CEO of the Gett (GetTaxi) technology company, Alexey was in charge of developing the revolutionary Gett service from ground up and deploying the operation across the globe from New York to London and Tel Aviv.

Enjoyed the read?

Don't miss our next article!


Previous post

How to Develop an Event App: A Step-by-Step Guide

The article was updated on August 15, 2023. There are hundreds of sporting events, exhibitions, festivals, conferences, and meetups held ...

Previous post

Smart Airports: Optimizing Airport Operations with Technology

In recent years, the aviation industry has been recovering from the impact of the pandemic that hit the travel sector ...

Cookies help us enhance your experience and navigation. By continuing to browse, you agree to the storing of cookies on your device. We do not collect your personal information unless you explicitly ask us to do so. Please see our Privacy policy for more details.

Jayanta Chakraborti

A Study to analyze the effectiveness of using Big Data Analytics in Banking Industry

Home A Study to analyze the effectiveness of using Big Data Analytics in Banking Industry

A Study to analyze the effectiveness of using Big Data Analytics in Banking Industry ¶

Parallely, we are also seeing the phenomenal growth of big data analytics. Big Data is characterized by the four V’s --- Volume, Velocity, Variability and Veracity. Over the years, there has been an explosion of data that is emanating from POS (point-of-sale), EDI (Electronic Data Interfaces), ATM transactions and Internet Banking transactions. As per information given by IBM, there are about 2.5 quintillion bytes (1 quintillion=1018) of data being generated every day. This data is mostly in unstructured form and is used for risk management, fraud detection, product customization and customer perceptual analysis.

This research paper will delve into the challenges faced by the Banking Industry in analyzing Big Data and utilizing the same in an effective manner for the profitability and growth of the business. The research work shall employ both primary and secondary data and analyze the same using an empirical framework to identify the critical success factors that are essential to successfully employ big data analytics to grow the banking business with minimum risks.

Introduction ¶

Banks have been the mainstay and supporting backbone for most of the industries nationally and globally since time immemorial. From the time the barter system was replaced with monetary system, banks have played a key role in treasury management, working capital management, investment management, project financing and several key activities. Today, the global banking industry does business worth $ 410 trillion every year with more than 15,000 banks taking active part in financial transactions. The leaders in the banking industry globally are Citibank, HSBC, JP Morgan, Goldman Sachs, Deutsche Bank, Wells Fargo, Bank of America, UBS, Credit Suisse and ICBC.

The banking industry in India has also been growing phenomenally since independence. A string of financial mismanagement and scandals had forced the Government to nationalize most of the banks in the 1960s and 70s. However, after 1991, following in the footsteps of liberalization, globalization and privatization, the private banking also got revived in a big manner.

As per information provided by Reserve Bank of India (RBI), there are presently 26 public sector banks, 20 private sector banks and 43 foreign banks who have been given the permission to undertake banking operations in India. Another two entities, IDFC (Infrastructure Development and Financing Corporation) and Bandhan, which was earlier a micro-finance company, have got banking licenses. There are also 61 regional rural banks and more than 90,000 co-operative banks. The banking sector in India has a net worth of Rs 81 trillion ($1.31 trillion). As per research done by KPMG and CII, the banking industry in India is all set to become the fifth largest banking industry in the world by 2020 and the third largest by 2025.



The growth of banking industry in India and across the world has been fuelled by two major strategic developments, namely, that of innovation and financial deregulation. The Riegle-Neal Act of 1994 removed interstate banking and branching restrictions in USA. The Gramm-Leach-Bliley Act of 1999 removed restrictions on mergers between commercial banks, investment banks, securities firms and insurance companies. The formation of European Union paved the way for a single currency in European nations and free flow of capital across all the member nations.

The financial deregulation in India received a major thrust with the implementation of liberalization-privatization-globalization policy by the Narsimha Rao - Manamohan Singh Government in 1991. The Narsimhan Committee set up by the Indian Government in 1991 gave key recommendations for increased autonomy of Public Sector Banks, which were later adopted by the policy makers. Subsequently, a second committee set up under the stewardship of Mr Narsimhan in 1997 advocated the healthy competition between private sector and public sector banks.

The new millennium has also seen rise in innovative approaches to banking. The technological revolution that popularized laptops and mobile phones and brought the internet closer to the masses has also been leveraged in a big manner by banks. Starting with internet banking, mobile banking and remote banking, technology has helped banks to reduce the dependence on physical branches and reach out to wider range of customers using virtual banking tools. In recent years, the mobile banking has been recording a stupendous growth with volume of business increasing from 25.56 million in 2012 to 53.3 million in 2013. The value of transactions during this period has grown from $0.2 billion in 2012 to $1.1 billion in 2013 (Source: Business Today, Jan 18, 2015). As per a report published in Business Today, 85 percent of transactions by customers of HDFC Bank now take place through non-branch channels. The mobile and internet banking channel contributes almost 55 percent. HDFC Bank is now transforming themselves from a brick-and-mortar entity into a full scale digital bank.



But the innovation and deregulation, though contributing to the rapid growth of the banking industry, has also come at a cost. As per a report published in Business World (29 December, 2014), between January 2008 and December 2011, 414 insured US commercial banks failed. Of these, 85 percent were small banks with less than $1 billion in assets. The common underlying causes for failure were excessive credit growth and dud realty loans.

The Indian banks were also at the receiving end, seeing their NPA (Non-Performing Assets) growing to a disproportionate size. The gross NPA of Indian banks was at 4% as on 31st March, 2014. The same figure was 4.2% on 31st March, 2013. The total write-off of loans by banks in the last five years was Rs 1,61,018 Crore. Owing to increasing number of scandals in the industry and stricter policy from Reserve Bank of India, Indian banks are looking to upgrade their technology systems to analyze real time data to predict fraud or illegal activities.

There is also a competition prevailing in the banking industry over increasing the reach to customers using internet based tools. The banks are displaying customized product offerings through Internet Banking, Mobile Banking and ATM. HDFC Bank is going digital in a big manner. There is a very systematic focus within the bank in making customers use and adopt digital channels. HDFC Bank has already invested in data warehousing, analytics, outbound call centers and models for customer relationship management.

The efforts to reduce the risks in banking transactions and increasing reach to tap into wider customer base needs access to a huge quantum of data and the capability to process this data to draw out meaningful inferences that can be used for decision making. This is where Big Data Analytics is poised to play an important role in promoting the growth of the banking industry as well as mitigating the quantum of risks. Our study is an endeavor to understand how big data analytics can be used effectively by the banking industry to formulate strategies to contribute to the growth of the topline and the bottom-line of the banking entities.

The Advent of Big Data Analytics ¶

Gartner defines big data as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Every day, there is creation of 2.5 quintillion bytes of data. Data volume is increasing exponentially from terabytes to petabytes, exabytes and now zettabytes. According to IBM, 80% of data captured today is unstructured, from sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals, to name a few. All of this unstructured data is big data. Big Data is a collection of huge and complex data sets that becomes difficult to process using on-hand database management tools or traditional data processing applications.

FIGURE-3 : Big Data = Transactions + Interactions + Observations

Big Data =  Transactions + Interactions + Observations

Big Data is getting mainly getting generated by:

  • Social Media and Networks
  • Scientific Instruments
  • Mobile Devices
  • Sensor Technology and Networks

The payoff from using big-data analytics for analyzing transactions related to banking is enormous. The quantum of successful case studies continues to build, reinforcing broader research suggesting that when companies inject data and analytics deep into their banking operations, they can deliver effective sales and higher profit gains. The new strategy of data-driven sales, more in-depth information about consumer behavior, better predictions, and shorter decision making cycles is making companies adopt this model at a faster rate. According to Gartner, Big Data will drive $232 billion in IT spending through 2016.

Hence, there is a requirement for a strategic plan on how banks can leverage data, analytics, frontline tools, and people to create business value. The effectiveness of the plan shall lie in creating a common language allowing senior executives, technology professionals, data scientists, and marketing managers to discuss where the greatest returns will come from.

FIGURE – 4 : The Four V’s of Big Data

The Four V’s of Big Data

Research Methodology ¶

Research objective ¶, research design ¶.

The Research Design for the study is based on descriptive research model in which the analysis has been done on the basis of data collected through primary research and also research of relevant published secondary data.

Data Capturing Instrument ¶

The data was captured using questionnaires. Each questionnaire comprised of 10 questions.

The data that was sought pertains to the challenges faced by the banks in processing big data and adopting strategies to use the same in a meaningful manner.

The questionnaires were personally administered as well as sent and collected through e-mail.

The primary data has been collected through questionnaires administered to the banking professionals who are aware of the technologies being used in the banking process and using the same in day-to-day operations and transactions.

The sample size for primary data collection was 100. The respondents were selected through simple random sampling process.

The secondary data has been collected from a diversified pool of resources such as newspapers, journals and research articles published by consulting companies like Mckinsey, E&Y, PwC, KPMG and CII. The secondary data shall help to validate the inferences drawn from the primary research.

Data Analysis ¶

The data has been analyzed using statistical tools like graphical analysis and percentage analysis.

Results / Findings ¶

1.Do you use big data analytics for your banking operations?

case study big data analytics in banking sector

2.What are the main purposes for which you use big data analytics?

case study big data analytics in banking sector

3.How well trained are your personnel to leverage the power of big data analytics?

case study big data analytics in banking sector

4.What are the main challenges that you face in utilization of big data analytics?

case study big data analytics in banking sector

5.What is the importance of big data analytics for formulating strategies in your banking operations?

case study big data analytics in banking sector

6.What benefits are you deriving by using big data analytics?

case study big data analytics in banking sector

7.Has big data analytics contributed significantly to the topline and bottom-line of your business?

case study big data analytics in banking sector

8.Has big data analytics been effective to provide meaningful insights into customer behaviour?

case study big data analytics in banking sector

9.What do you think will be the future drivers of banking operations?

case study big data analytics in banking sector

10.What are your recommendations for better utilization of big data analytics?

case study big data analytics in banking sector

Data Inference and Conclusion ¶

The Big Data Analytics is mainly used by Banks for Marketing Analytics, Risk and Fraud Management, and Strategy Formulation. Big data analytics help banks to do customer segmentation, customer retention, campaign management, cross-selling and up-selling activities. In risk and fraud analytics, banks use credit scorecards to estimate propensity to default and build early warning system (EWS) to get alerts on fraudulent practices. For strategy formulation, banks use data visualization interfaces and profit and loss (P&L) reporting.

FIGURE – 5 : Application of Big Data Analytics in Banking

Application of Big Data Analytics in Banking

Banks like HDFC and ICICI Banks had started investing in big data analytics since 2004. They built up data warehouses and also invested in cutting edge data mining tools that could unravel business trends, map customer preferences and provide alerts to prevent risky occurrences. The benefits that they have accrued are now motivating other bankers to put in a system to capture and analyze big data. Here, based on our study, we would like to propose the following strategies that would help banks to employ big data analytics in an effective manner.

Use of Predictive Analytics ¶

Predictive Analytics can help the banks to decide on the right mix of customers as well as decide on interest rates and insurance premiums. Predictive Analytics can also help banks to cross-sell and up-sell banking products. The other applications of predictive analytics are in risk management, fraud management and marketing campaign management.

Predictive Analytics play a key role in helping banks to retain the customers. The cost of acquiring a new customer is much higher than retaining the old customers. Predictive Analytics make it much easier to identify the dissatisfaction issues pertaining to a customer and rectifying them well in advance to keep the customer loyal to the bank. Predictive tools like SAS text miner, IBM SPSS, COGNOS and SAP-Hana have the capabilities to mine the data and draw out predictive inferences for bank to act upon.

Leveraging internal, external and social media data ¶

Big Data comprises of information drawn from the bank’s internal interfaces, external interactions and the data emanating from social networks. The internal data comprises of the banks financial records like balance sheet, profit and loss statement, and cash flow – fund flow statements. The external data gets created during e-mail exchanges, telephonic banking operations, internet banking and mobile banking transactions and during ATM usage. The social media data comes from Facebook, Twitter and LinkedIn discussions and Google search engine operations. The banks need to harness and synchronize the internal, external and social media by doing multivariate analysis to get meaningful insights.

The Netherlands based Rabobank has been a pioneer of using big data analytics. They began their initiative with analyzing the internal data. Then, they went for full-fledged big data analytics by incorporating analysis of public data from government sources, click-behaviour data and social network data. They also built small clusters using open source technology to test and analyze unstructured data sets. Rabobank now extensively uses big data analytics to decide on the locations of ATMs which can offer them better leverage.

Creating Master Data Management Framework to manage structured and unstructured data ¶

Big Data comes in both structured and unstructured formats. The structured data is in the form of quantifiable numbers while the unstructured data is mostly qualitative in nature in the form of text, images or audio and video recordings. While historically banks have till now depended on structured data to analyze, draw inferences and take decisions, the changing scenario makes it imperative that they pay equal or even more attention to unstructured data which can give them even more incisive inferences. To achieve this banks have to create a master data management framework that can manage both structured and unstructured data.

Bank of America is driving several initiatives to manage structured and unstructured data and reorganizing their internal operations to understand customers in a better way and offer them customized products and services that would have more appeal. One of their initiatives is called “BankAmeriDeals” that provides cash-back offers to their credit card and debit card holders based on the payment records these customers. They also offer refinancing deals to their customers after analyzing data on credit card usage or mortgage loan repayments. BoA uses a dedicated Grid-Computing platform to detect the high-risk accounts and take corrective actions to prevent defaults.

Training the Employees ¶

The adoption of big data analytics can be successful only if there is an internal thrust and agreement from inside the organization that is both top driven and bottom driven. To achieve a completely successful implementation, big data analytics should not be restricted within the precincts of the IT or MIS departments. It is only when each and every employee is well trained on big data analytics that they will understand the importance and the returns that can be generated by the adoption of new age tools.

To make the big data analytics implementation program more successful, Bank of America created a matrix organizational structure wherein the analytics teams were interspersed with other functionaries. This helped every unit of the bank to analyze customer data, understand them in a better manner and offer customized services to gain more customer satisfaction.

Educating the Customers ¶

The customers are the end users and the final beneficiaries of big data analytics. For success of big data analytics, the customers have to be taken into confidence and educated about the way the banks are adopting new age technologies to offer them real time, prompt and effective services that leads to higher degree to customer satisfaction. That will ultimately get translated into higher degree of customer loyalty leading to better margins and ROI (return on investments).

References ¶

1.Barth, Paul; Earley, Seth; Lawson, Lauraine; Turning Big Data Into Useful Information, Quinstreet Storage ebook. 2.BT-KPMG Best Bank Survey. 2015. Page 56-78, Business Today, January 18, 2015 3.BW-PwC Best Bank Survey, 2014. Page 56-105, Business World, Business World, December 29, 2014 4.Cherian Roy; Gupta, Anunay; Application of Decision Sciences to Solve Business Problems in Retail Banking Industry, Marketintelligent, 5.CIMA, The global banking sector: current issues; 2010, www.cimaglobal.com 6.Coumaros, Jean; Buvat, Jerome; 2014. Big Data Alchemy: How can banks maximize the value of their customer data, Capgemini Consulting. 7.Cukier, Kenneth; Mayer-Schonberger, Viktor (2013-03-14). Big Data: A Revolution That Will Transform How We Live, Work and Think. (p. 70). John Murray. Kindle Edition. 8.Dahlström, Peter; Edelman David; (2013). The coming era of On Demand Marketing. April 2013. McKinsey & Company. Kindle Edition. 9.Daruvala,Toos; Advanced Analytics redefines banking, Mckinsey Quarterly, April 2013 10.Gordon, Jonathan; McGuire, Tim; Goyal, Manish; Spillecke Dennis. (2012). Big Data & Advanced Analytics. McKinsey & Company. December 2012. Kindle Edition. 11.Hurwitz, Judith; Nugent, Alan; Halper, Fern; Kaufman, Marcia (2013-04-02). Big Data For Dummies (Kindle Locations 571-573). Wiley. Kindle Edition. 12.KPMG-CII, 2013. Indian Banking – Maneuvering through turbulence; Emerging strategies. 13.Kumar, Anjani; Shenoy, Raghavendra; 2010. Analytics in Retail Banking; Finsights, Infosys. 14.Krishna, Vishal; 2013. How Big is Big Data. 25th March, 2013. Business World 15.Laeven, Luc; Ratnovski, Lev and Tong, Hui; Bank Size and Systemic Risk, International Monetory Fund (IMF), 2014 16.Leibowitz, Josh; Ungerman, Kelly; Masri, Maher. October 2012. Know your Customers wherever they are, McKinsey & Company. Kindle Edition 17.McKinsey Chief Marketing & Sales Officer Forum (2013-07-09). Big Data, Analytics, and the Future of Marketing & Sales (Kindle Locations 314-317). McKinsey & Company. Kindle Edition. 18.McAfee, Andrew; Brynjolfsson, Erik; (October 2012), Big Data: The Management Revolution, Harvard Business Review. 19.PwC; 2013. Banking in 2050, How the financial crisis has affected the long term outlook for the global banking industry. 20.Reserve Bank of India, Branch Banking Statistics, http://www.rbi.org.in/scripts/AnnualPublications.aspx 21.Sen, Anirban; Banking on Big Data Analytics; Livemint, Wed, Jul 23 2014 22.Schilch, Bill; Baggs, Ian; 2013. Transforming banks, Redefining banking; Ernst & Young. 23.Turner, David; Schroeck, Michael and Shockley, Rebecca; 2013. Analytics: The real-world use of big data in financial services, IBM Institute for Business Value.

case study big data analytics in banking sector

Associate Professor, Chandigarh University.

Jayanta Chakraborti is Associate Professor of Business Analytics and Digital Marketing at Chandigarh University. He holds a Bachelor’s degree in Mechanical Engineering from TEC Agartala, MBA from IMI Europe, MA in Economics from Devi Ahilyabai University Indore, Diploma in French from Delhi University and Certification in German from Max Mueller Munich.

Book cover

Technologies and Applications for Big Data Value pp 273–297 Cite as

Big Data Analytics in the Banking Sector: Guidelines and Lessons Learned from the CaixaBank Case

  • Andreas Alexopoulos 7 ,
  • Yolanda Becerra 8 ,
  • Omer Boehm 9 ,
  • George Bravos 10 ,
  • Vasilis Chatzigiannakis 10 ,
  • Cesare Cugnasco 8 ,
  • Giorgos Demetriou 11 ,
  • Iliada Eleftheriou 12 ,
  • Lidija Fodor 13 ,
  • Spiros Fotis 7 ,
  • Sotiris Ioannidis 14 , 15 ,
  • Dusan Jakovetic 13 ,
  • Leonidas Kallipolitis 7 ,
  • Vlatka Katusic 11 ,
  • Evangelia Kavakli 12 ,
  • Despina Kopanaki 15 ,
  • Christoforos Leventis 15 ,
  • Mario Maawad Marcos 16 ,
  • Ramon Martin de Pozuelo 16 ,
  • Miquel Martínez 8 ,
  • Nemanja Milosevic 13 ,
  • Enric Pere Pages Montanera 17 ,
  • Gerald Ristow 18 ,
  • Hernan Ruiz-Ocampo 11 ,
  • Rizos Sakellariou 12 ,
  • Raül Sirvent 8 ,
  • Srdjan Skrbic 13 ,
  • Ilias Spais 7 ,
  • Giorgos Vasiliadis 15 &
  • Michael Vinov 9  
  • Open Access
  • First Online: 29 April 2022

11k Accesses

A large number of EU organisations already leverage Big Data pools to drive value and investments. This trend also applies to the banking sector. As a specific example, CaixaBank currently manages more than 300 different data sources (more than 4 PetaBytes of data and increasing), and more than 700 internal and external active users and services are processing them every day. In order to harness value from such high-volume and high-variety of data, banks need to resolve several challenges, such as finding efficient ways to perform Big Data analytics and to provide solutions that help to increase the involvement of bank employees, the true decision-makers. In this chapter, we describe how these challenges are resolved by the self-service solution developed within the I-BiDaaS project. We present three CaixaBank use cases in more detail, namely, (1) analysis of relationships through IP addresses , (2) advanced analysis of bank transfer payment in financial terminals and (3) Enhanced control of customers in online banking , and describe how the corresponding requirements are mapped to specific technical and business KPIs. For each use case, we present the architecture, data analysis and visualisation provided by the I-BiDaaS solution, reporting on the achieved results, domain-specific impact and lessons learned.

  • Self-service solution
  • Security applications
  • Big data analytics
  • Advanced analytics
  • visualisations

Download chapter PDF

1 Introduction

Collection, analysis and monetisation of Big Data is rapidly changing the financial services industry, upending the longstanding business practices of traditional financial institutions. By leveraging vast data repositories, companies can make better investment decisions, reach new customers, improve institutional risk control and capitalise on trends before their competitors. But given the sensitivity of financial information, Big Data also spawns a variety of legal and other challenges for financial services companies. Footnote 1

Following this digitalisation trend, CaixaBank has been developing its own Big Data infrastructure since years and has been awarded several times (e.g. ‘2016 Best Digital Retail Bank in Spain and Western Europe’ by Global Finance). With almost 14 million clients across Spain (and Portugal under their subsidiary brand BPI), CaixaBank has a network of more than 5000 branches with over 40,000 employees and manages an infrastructure with more than 9500 ATMs, 13,000 servers and 30,000 handhelds. All those figures represent a massive amount of data collected every day by all the bank systems and channels, gathering relevant information of the bank’s operation from the clients, employees, third-party providers and autonomous machines. In total, CaixaBank has more than 300 different data sources used by their consolidated Big Data models and more than 700 internal and external active users enriching their data every day, which is translated into a Data Warehouse with more than 4 PetaBytes (PBs), which increases by 1 PB per year.

Much of this information is already used in CaixaBank by means of Big Data analytics techniques, for example, to generate security alerts and prevent potential frauds—CaixaBank faces around 2000 attacks per month. However, CaixaBank is one of the banking leaders in the European and national collaborative research, taking part in pre-competitive research projects. Within the EU I-BiDaaS project (funded by the Horizon 2020 Programme under Grant Agreement 780787), CaixaBank identified three concrete use cases, namely (1) analysis of relationships through IP addresses, (2) advanced analysis of bank transfer payment in financial terminals and (3) Enhanced control of customers in online banking to study the potential of a Big Data self-service solution that will empower its employees, who are the true decision-makers, giving them the insights and the tools they need to make the right decisions in a much more agile way.

In the rest of this chapter, Sect. 2 discusses the requirements and challenges for Big Data in the banking sector. Section 3 details the different use cases considered, together with their technical and business KPIs. In Sect. 4 , for each use case, we present the architecture, data analysis and visualisation of the I-BiDaaS solution, reporting on the achieved results and domain-specific impact. It also relates the described solutions with the BDV reference model and priorities of the BDV Strategic Research and Innovation Agenda (SRIA) [ 1 ]. Section 5 summarises the lessons learned through all the experiments deployed by CaixaBank and the rest of I-BiDaaS partners, especially on how to handle data privacy and how to iteratively extend data usage scenarios. Finally, Sect. 6 presents some conclusions.

2 Challenges and Requirements for Big Data in the Banking Sector

The vast majority of banking and financial firms globally believe that the use of insight and analytics creates a competitive advantage. The industry also realises that it is sitting on a vast reservoir of data, and insights can be leveraged for product development, personalised marketing and advisory benefits. Moreover, regulatory reforms are mainly leading to this change. Ailing business and customer settlements, continuous economic crisis in other industry verticals, high cost of new technology and business models, and high degree of industry consolidation and automation are some of the other growth drivers. Many financial services currently focus on improving their traditional data infrastructure as they have been addressing issues such as customer data management, risk, workforce mobility and multichannel effectiveness. These daily problems led the financial organisation to deploy Big Data as a long-term strategy and it has turned out to be the fastest growing technology adopted by financial institutions over the past 5 years. Footnote 2

Focusing on the customer is increasingly important and the critical path towards this direction is to move the data analytics tools and services down to the employees with direct interaction with the customers, utilising Big-Data-as-a Self-Service solutions Footnote 3 [ 2 ].

Another critical requirement for financial organisations is to use data and advanced analytics for fraud and risk mitigation and achieving regulatory and compliance objectives. With cyber security more important than ever, falling behind in the use of data for security purposes is not an option. Real-time view and analysis are critical towards competitive advantage in the financial/banking sector.

The usage of Big Data analytics is gradually being integrated in many departments of the CaixaBank (security, risks, innovation, etc.). Therefore, there is a heterogeneous group of experts with different skills but the bank also relies on several Big Data analytics experts that provide consultancy services. However, the people working with the huge amount of data collected from the different sources and channels of CaixaBank can be grouped into the following categories (which indeed could be fairly generalised to other financial entities):

IT and Big Data expert users : employees and third-party consultants with excellent programming skills and Big Data analytics knowledge.

Intermediate users : People with some notion on data analytics that are used to work with some Big Data tools, especially for visualisation and Big Data visual analysis (such as QlikSense/QlikView Footnote 4 ). They are not skilled programmers, although they are capable of programming simple algorithms or functions with Python or R.

Non-IT users : People with an excellent knowledge of the field and the sector; they could interpret the data, but they lack programming skills or Big Data analytics knowledge.

Although ‘IT and Big Data expert users’ are getting more involved and being a relevant part of the day-by-day business operations of the entity, there are few compared to the ‘Intermediate’ and ‘Non-IT users’. Reducing the barriers and the knowledge required by those user categories in exploiting efficiently the collected data represents one of the most relevant challenges for CaixaBank.

With all this, the I-BiDaaS methodology for eliciting CaixaBank requirements (see Table 1 ) took into consideration the specific challenges faced by CaixaBank, as well as the literature on Requirements Engineering (RE) approaches specifically for Big Data applications [ 3 ].

In particular, the I-BiDaaS methodology followed a goal-oriented approach to requirements engineering [ 4 ], whereby elicitation of requirements was seen as the systematic transformation of high-level business goals that reflect the company vision with respect to the Big Data analytics activity or project, the user requirements of the groups of stakeholders involved (e.g., data providers, Big Data capability providers, data consumers) and finally the specific system functional and non-functional requirements, which describe the behaviour that a Big Data system (or a system component) should expose, or the capabilities it should own in order to realise the intentions of its users.

The requirements elicitation process was carried out in collaboration with both CaixaBank stakeholders and Big Data technology providers. It involved two steps: the first step was to extract specific requirements based on the characteristics of each CaixaBank use case; the second step involved the consolidation of all requirements in a comprehensive list. Appropriate questionnaires were used to assist participants in expressing their requirements. Requirements consolidation was guided by generic requirements categories identified through the review of RE works for big data applications [ 5 ].

Although described in a linear fashion, the above activities were carried out in an iterative manner resulting in a stepwise refinement of the results being produced. The complete list of all requirements elicited is described in detail in [ 6 ].

3 Use Cases Description and Experiments’ Definition: Technical and Business KPIs

The CaixaBank experiments aim at evaluating and validating the self-service Big Data platform [ 7 ] proposed in the framework of the I-BiDaaS project, and its implementation in the specific CaixaBank use cases. More precisely, the experiments aim to test the efficiency of the I-BiDaaS platform for reducing the costs and the time of analysing large datasets whilst preserving data privacy and security.

The definition of the experiments follows a goal-oriented approach, whereby for each experiment: the experiment’s goal(s) towards which the measurement will be performed are first defined. Then a number of questions are formed aiming to characterise the achievement of each goal and, finally, a set of Key Performance Indicators (KPIs) and the related metrics are associated with every question in order to answer it in a measurable way.

Such KPIs have been defined at the business level during the user requirements elicitation phase (see Sect. 2 ). However, they need to be further elaborated and refined so that they can be mapped onto specific indicators at the Big Data application and platform level. This ensures that (a) both business and technical requirements and (b) the traceability among business and application performance are taken into consideration. In addition, for each KPI, the baseline (current) value and the desired improvement should also be defined, whose measurement relates to the achievement (or not) of the specific indicator.

The definition of each experiment also included the definition of the experiment’s workflow in terms of the type and order of activities (workflow) involved in each experiment, as well as the definition of the experimental subjects that will be involved in the experiment.

Taking all the aforementioned into account, CaixaBank proposed three different use cases and evaluated the I-BiDaaS tools from the perspective of potential usage by those different groups of employees:

Analysis of relationships through IP addresses.

Advanced analysis of bank transfer payment in financial terminals.

Enhanced control of customers in online banking.

The rest of the section includes the final use cases definitions in chronological order as developed and deployed in the project. We also refer the reader to Sect. 4 for further details of the use cases corresponding solutions and Sect. 5 , which provides a complementary description, collecting the lessons learned during the respective processes.

3.1 Analysis of Relationships Through IP Addresses

Analysis of relationships through IP addresses was the first use case selected to test the I-BiDaaS Minimum Viable Product (MVP). In this use case, CaixaBank aims to validate the usage of synthetic data and the usage of external Big Data analytics platforms. It is deployed in the context of identifying relationships between customers that use the same IP address when connecting to online banking. CaixaBank stores information about their customers and the operations they perform (bank transfer, check their accounts, etc.) using channels such as mobile apps or online banking, and they afterwards use this data for security and fraud-prevention processes. One of the processes is to identify relationships between customers and use them to verify posterior bank transfers between linked customers. Such operations are considered with lower possibility to be fraudulent transactions. It allows CaixaBank’s Security Operation Centre (SOC) to directly discard those bank transfers during the revision processes. The goal of this experiment is to validate the use of synthetic data for analysis (i.e. one of the I-BiDaaS platform features), evaluate the quality of the synthetic data (i.e. if the algorithm can find the same amount and patterns of connections using the real and the synthetic datasets) and to test the time efficiency of the I-BiDaaS solution.

3.2 Advanced Analysis of Bank Transfer Payment in Financial Terminals

The second CaixaBank use case that was studied in I-BiDaaS is advanced analysis of bank transfer payment in financial terminals . This use case aims to detect the differences between reliable transfers and possible fraudulent cases. The goal of this experiment is to test the efficiency of the I-BiDaaS solution in the context of anomaly detection in bank transfers from employees’ workstations ( financial terminals ).

For that reason, the first step was to identify all the contextual information from the bank transfer (i.e. time execution, transferred amount, etc.), the sender and receiver (e.g. name, surname, nationality, physical address, etc.), the employee (i.e. employee id, authorisation level, etc.) and the bank office (e.g. office id, type of bank office, etc.). All this information is coming from several relational database tables stored in the CaixaBank Big Data infrastructure (called ‘datapool’). The meaningful information was extracted and flattened in a single table. This task is particularly challenging because it is needed to identify events and instances from the log file corresponding to the money transfer operations carried out by an employee from a bank centre and to connect those related to the same bank transfer. The heterogeneous nature of the log files, as saved in the CaixaBank datapool, makes this task even more difficult. There is a total of 969,351,155 events in the log data just for April 2019. These events are heterogeneous in nature and arise from mixing of disparate operations associated with services provided by the different types of bank offices. After a laborious table flattering and composition process, a table of 32 fields was obtained and then tokenised.

3.3 Enhanced Control of Customers in Online Banking

In this use case, we focused on analysing the mobile-to-mobile bank transfers executed via online banking (web and application). It focuses on the assessment that the controls applied to user authentication are adequately implemented (e.g. Strong Customer Authentication (SCA) by means of second-factor authentication) according to PSD2 regulation and depending on the context of the bank transfer. With that aim, we wanted to cluster a dataset collected from mobile-to-mobile transfers. Most of the information of this dataset does not need encryption because only a few fields were sensitive. The main objective of the use case is to identify useful patterns of mobile-to-mobile bank transfers and enhance current cybersecurity mechanisms. A set of mobile-to-mobile bank transfers and the authentication mechanisms applied by CaixaBank to authenticate the senders of those transfers is analysed in order to identify if different mechanisms should be applied to specific subsets of transfers. The use case tries to identify small subsets of transfers that present similar patterns to larger sets of transfers with other types of authentication mechanisms, to re-assess if the correct authentication mechanism is being applied to this subset of transfers or if the level of security in the authentication process should be increased.

4 I-BiDaaS Solutions for the Defined Use Cases

4.1 analysis of relationships through ip addresses, 4.1.1 architecture.

The architecture uses a traditional component-based architecture where the components communicate via a message queue (Universal Messaging component). This approach is important for a scalable and flexible hardware resource organisation. The architecture includes a batch and a stream processing subcase, complemented with the Universal Messaging component (mentioned earlier) for easing communication between components, an Orchestration layer that coordinates their interaction and a Visualisation layer providing an extensible visualisation framework that helps with data inspection and user interaction with the system. The Universal Messaging component uses a message queue system that allows easy, robust and concurrent communication between components. The Orchestration layer uses Docker for managing other components, which are all running individually as Docker containers. Figure 1 depicts the components of the architecture, as well as their interactions.

figure 1

Updated data flow for the components of the architecture

The batch processing subcase starts with the creation of a file of realistic synthetic data in SQLite format (TDF component Footnote 5 ), which is then imported in a Cassandra Database, which is specifically used for its distributed properties, and COMPSs [ 8 ] with Hecuba Footnote 6 are used to run the analysis. In the streaming subcase, transactions of users are created and published via the Message Queuing Telemetry Transport (MQTT) protocol (Universal Messaging), and later an APAMA Footnote 7 GPU-enabled data processing application loads the data analysis created in the batch subcase, and compares any data coming from the stream to it, generating a new message if there is a match. 3D data analysis through visualisation is also available via the Qbeast tool [ 9 ].

The rationale for using realistic synthetic data (TDF component in Fig. 1 ) is that technology development and testing processes can be simplified and accelerated, before or in parallel with carrying out the processes of making real data available (e.g. a tedious data tokenisation process). The incorporation of realistic synthetic data is done with care and is subject to data quality assessment (see Sect. 4.1.2 ) An operational-ready solution then replaces the realistic synthetic data with the corresponding real, tokenised data, as described in the subsequent sections.

4.1.2 Data Generation

In this first use case, we tried to evaluate the usage of fabricated data, which was created using TDF according to a set of rules defined by CaixaBank. The rules were refined several times in order to create realistic data for all different fields considering the format of the real data. It is difficult to distinguish a data sample from a field in the synthetic dataset and a sample from the same field in the real dataset. Some properties were difficult to model as constraint rules, e.g. the concrete time connectivity patterns that the real data follows, and thus they were not included in the specification of the synthetic dataset. Constraints for parameters which were not critical for the relationship analysis that was performed in the use case were sometimes relaxed as long as they allowed the synthetic dataset to remain valid for assessing that there exist the same percentage of relationships as in the real dataset.

4.1.3 Data Analytics

The goal of the case is to find relations between people, given a set of connections to IP addresses, maximising the detection of close relations between users. This application has been implemented using the COMPSs programming model and Hecuba as the data interface with the Cassandra database.

We have defined several parallel tasks, not only to exploit parallelism but also to benefit from the automatic detection of the dependencies from COMPSs. Using Cassandra to store the data allows us to delegate on the database the management of the global view of the data. This approach frees programmers from implementing an explicit synchronisation between those parallel tasks that modify the data structure. This way, removing the synchronisation points, we are able to maximise the parallelism degree of the application and thus the utilisation of the hardware resources. Notice that the interface for inserting data in Cassandra is asynchronous with the execution of the application, this way overlapping data storage with computation.

The approach to solve this implementation has been to define a clustering-based analysis of CaixaBank’s IP address connections using a synthetic dataset. The purpose of the analysis is to provide additional modelling possibilities to this CaixaBank’s use case. The obtained results should be understood relative to the fact that the data set utilised is synthetic, even though the initial feedback from CaixaBank about the usefulness of the developed process is positive, and the approach is promising. The data set contains 72,810 instances, with each instance containing the following attributes:

User ID —representing a unique identification number for each user.

IP address —representing the IP address of the connection of the user.

Date —representing the date and the time of the connection made by the user.

Operation —representing the code of the business operation made by the user.

Status —representing the code of the status of the operation made by the user.

Initially, the dataset is transformed as follows: each user represents a sample, while each IP address represents a feature. In such a data matrix, the value in position ( i, j ) represents the number of times user i connected via IP address j . Such a dataset turns out to be extremely sparse. In order to tackle this problem and retain only meaningful data, the next pre-processing step is to drop all the IP addresses that were used by only one user (intuitively, such IP addresses represent home network, etc. and thus cannot be used to infer relationships between users). After dropping all such IP addresses, 1075 distinct IP addresses remain from the initial 22,992 contained in the original dataset. Subsequently, we filter out the users that are not connected to any of the remaining IP addresses.

To infer relationships between users, we applied clustering algorithms. In particular, we used K-means [ 10 ] and DBSCAN [ 11 ], which are both available in the dislib library [ 12 ]. Additionally, we used the t-distributed Stochastic Neighbour Embedding (t-SNE) method [ 13 ] to visualise the reduced dataset in 2D. The visualisation is presented in Fig. 2 .

figure 2

t-SNE 2D visualisation

Both K-means and DBSCAN offer some interesting hyperparameters. In particular, K-means allowed us the flexibility of setting the desired number of clusters. A preset number of clusters could be a limitation, especially in an exploration phase. However, the possibility to set up the number of clusters allowed us to define them according to the number of authentication mechanisms, for example, which was useful in the analysis of the use case. On the other hand, DBSCAN decides on the number of clusters internally while providing us with the parameters that represent the minimum number of samples in a neighbourhood for a point to be considered a core point, and the maximum distance between two samples for them to be considered as in the same neighbourhood. These parameters are to be set by an end-user based on experimentation and domain knowledge and are tuneable through the I-BiDaaS user interface.

Moreover, the evaluation of this use case was especially focused on analysing the validation of fabricated data for identifying patterns and number of connections. Therefore, a more advanced analysis with K-means and DBSCAN was done using both, the synthetic dataset and a tokenised version of a real dataset. The data tokenisation process included the encryption of all the fields of the dataset. The analysis performed over this dataset allowed the inference of conclusions and relationships in the real non-encrypted data.

4.1.4 Visualisations

The visualisation of the use case includes several graphic types. First, a graph shows the distribution of relationships detected based on their IP addresses (Fig. 3a ).

figure 3

( a ) User groups per IP relationships. ( b ) Real-time relationship detection

Using these relationships, visualisation of real-time bank transfers in the form of a continuous stream of sender-receiver records is used to emulate real-time detection of possibly fraudulent transactions (Fig. 3b ). The visualisation utilises the previously detected relationships to display a graph of connected users so as to aid operators in determining possible relationships between users and decide whether further actions should be taken.

4.1.5 Results

Results obtained from both real tokenised data and the synthetic data using those algorithms showed that the majority of the clusters found were 2-point clusters, indicating a good similarity for this use case.

An additional evaluation process was performed to determine a specific utility score, i.e. the similarity of results of analyses from the synthetic data and the original data. The propensity mean-squared-error (pMSE) was used as a general measure of data utility to the specific case of synthetic data. As specific utility measures we used various types of data analyses, confidence intervals overlap and standardised difference in summary statistics, which were combined with the general utility results (Fig. 4 ).

figure 4

Results for 100 random sampling taken from the real and synthetic data (5 K datapoints each) and the pMSE calculated using a logistic model

By randomly sampling 5000 datapoints from real and synthetic datasets, and using logistic regression to provide the probability for the label classification, we were able to show that the measured mean pMSE score for the ‘analysis of relationships through IP addresses’ dataset is 0.234 with a standard deviation of 0.0008.

Those quantitative results showed that the fabricated data is objectively realistic to be used for testing the use case. However, the rule-generation process that involves the data fabrication through TDF can be complex and long in other cases in which the knowledge of the data is not complete or the extraction of rules through statistical analysis is not clear.

4.2 Advanced Analysis of Bank Transfer Payment in Financial Terminals

4.2.1 architecture.

The architecture (i.e. the specific components of the I-BiDaaS general architecture) in this use case is the same as the one described in Sect. 4.1.1 , focused on the batch processing part. Therefore, what essentially changes are the algorithms used for processing the data (i.e. the bank transfers conducted by employees on their financial terminals). These algorithms will be described in the next sections.

4.2.2 Data Analytics

This CaixaBank use case is focused on advanced analysis of bank transfers executed by employees on financial terminals to detect possible fraud, or any other potential anomalies that differ from the standard working procedure. The used dataset is composed of different attributes which record the different steps that the employee performs and other important data such as the account, client or amount of money transferred. All the data is encrypted using the Dice Coefficient [ 14 ], which codifies the data without losing important information.

All data processing techniques, like the K-means, PCA (Principal Component Analysis) [ 15 ] and DBSCAN have been performed using the dislib library. Also, the data structure used by dislib has been modified to be stored on a Cassandra Database using the Hecuba library.

The received dataset must be pre-processed before using the data transformation techniques from dislib. First, the attributes which contained the same value for all the registers have been deleted, as they do not give any relevant information. Also, all nulls and blank registers have been transformed into 0 values. Finally, for those categorical attributes, we transform the variable categories into columns (1, 0), a transformation known as one-hot encoding [ 16 ].

Due to the encoding transformations, the number of attributes has increased considerably from 89 to 501. This large amount of attributes made it difficult to perform K-means, and for this reason, it was decided to apply a PCA transformation to reduce the number of dimensions to 3, to also be able to represent it graphically. Before applying the PCA transformation, and due to the differences in the magnitude of the attributes, we have standardised the data using the scikit-learn method StandardScaler .

Finally, we have executed two different clustering algorithms: DBSCAN and K-means. As K-means requires the desired number of clusters as an input parameter, we have executed first DBSCAN and we have used the obtained number of clusters as the input parameter of K-means.

4.2.3 Visualisations

For this use case, a 3D graph of the data and detected anomalies has been developed. Users can select parts of the graph to focus on and can also extract the specific data samples that are included in the selection (Fig. 5 ).

figure 5

Visualisation of detected anomalies in 3D graph

4.2.4 Results

Figure 6a shows the graphical representation of the clusters generated by DBSCAN in a three-dimensional space, where the third dimension of the PCA is shown as the Z -axis. The result of K -means can be examined in Fig. 6b , we can observe that some values in the Z -axis are far away from the main cluster and, thus, are potential anomalies in the data.

figure 6

( a ) DBSCAN representation. ( b ) K -means representation

The PCA reduced the attributes from 501 to 3, thus it is difficult to understand which is the correlation between the resultant three dimensions and the 501 original attributes. In Fig. 7 , we have printed the mentioned correlation. We only show the first 84 because they are the most interesting with respect to the third dimension of the Z -axis. We can appreciate that this third dimension is heavily influenced by attributes from 64 to 82.

figure 7

Heat-plot of the 84 most relevant attributes from the 501 original attributes

4.3 Enhanced Control of Customers in Online Banking

4.3.1 architecture.

As in Sect. 4.2.1 , the set of components used to analyse bank transfers which were executed using online banking options (both web and mobile app) are the same as the ones described in Sect. 4.1.1 , also with a main focus on the batch processing part, and selecting a different set of algorithms to analyse the data, as will be described in the following sections.

4.3.2 Data Analytics

Following the objective of the use case in Sect. 3.3 , this use case tried to identify useful patterns of mobile-to-mobile bank transfers and enhance current cybersecurity mechanisms by identifying if there is a set of transactions in which the level of security in the authentication process should be increased. For that reason, we decided to analyse a dataset collecting the information of all mobile-to-mobile bank transfers from clients for a month and work on non-supervised methods such as clustering. That cluster was done on a categorical database so that most known algorithms lost efficacy. The first attempt was to apply a K -means. However, since the vast majority of available variables were not numerical, calculating the distances for grouping in K -means algorithm was no longer so simple (e.g. if there are three types of enhanced authentication, should the distance between them be the same? Should it be greater since some of them are more restrictive than the others?). This type of question affects the result of the model; therefore, a transformation was made to the data. We applied one-hot encoding [ 16 ]. This transformation allowed to eliminate the problems of calculating the distance between categories. Even so, the results were not satisfactory. Given the situation, a search/investigation process was carried out for an appropriate model for this case series. We find the k -modes library that includes algorithms to apply clustering on categorical data.

The k -modes algorithm [ 17 ] is basically the already known K -means, but with some modification that allows us to work with categorical variables. The k -modes algorithm uses a simple matching dissimilarity measure to deal with categorical objects, replaces the means of clusters with modes, and uses a frequency-based method to update modes in the clustering process to minimise the clustering cost function.

Once the algorithm has been decided, we must calculate the optimal number of clusters for our use case. For this, the method known as the elbow method is applied, which allows us to locate the optimal cluster as follows. We first define:

Distortion : It is calculated as the average of the squared distances from the cluster centres of the respective clusters.

Inertia : It is the sum of squared distances of samples to their closest cluster centre.

Then we iterated the values of k from 1 to 10 and calculated the values of distortion for each value of k and calculated the distortion and inertia for each value of k in the given range. The idea is to select the number of clusters that minimise inertia (separation between the components of the same cluster) (Fig. 8 ).

figure 8

Number of clusters selection for ‘Enhanced control of customers in online banking’

To determine the optimal number of clusters, we had to select the value of k at the ‘elbow’ in the point after which the distortion/inertia starts decreasing in a linear fashion. Thus, for the given data, we conclude that the optimal number of clusters for the data is 4. Once we know the optimal number of clusters, we apply k -modes with k  = 4 and analyse the results obtained.

4.3.3 Visualisations

A dynamically updated chart depicting the clusters in which the monitored transactions fall into was used for this use case. The number of clusters is automatically updated to reflect new ones being detected by the processing pipeline (Fig. 9 ).

figure 9

Sample of the I-BiDaaS graphical interface showing the identified clusters of incoming mobile-to-mobile bank transfers

4.3.4 Results

With this use case, I-BiDaaS allowed CaixaBank’s ‘Intermediate users’ and ‘Non-IT users’ to modify the number of clusters and run the algorithm over a selected dataset of transactions in a very fast and easy way. It was used for exploring clients’ mobile-to-mobile transaction patterns, identifying anomalies in the authentication methods and potential frauds, allowing fast and visual analysis of the results in the platform (Fig. 10 ).

figure 10

Sample of the ‘Enhanced control of customers in online banking’ use case clustering results in the I-BiDaaS platform

The results were checked with the Digital Security and Security Operation Centre (SOC) employees from CAIXA in order to correctly understand if the clustering algorithm applied allowed to identify potential errors in our automated authentication mechanisms in mobile-to-mobile bank transfers. The obtained clusters of entries were useful to identify the different mobile-to-mobile bank transfers patterns and reconsider the way we are selecting the authentication method to proceed with the transfer. I-BiDaaS tools made it easier for the SOC employees and Digital Security department to analyse and identify bank transfer patterns in which a higher level of security would be beneficial. The rules on the authentication mechanism application for those patterns were redefined, applying a more restrictive authentication mechanism in around 10% of the mobile-to-mobile bank transfers (in a first iteration).

4.4 Relation to the BDV Reference Model and the BDV Strategic Research and Innovation Agenda (SRIA)

The described solution can be contextualised within the BDV Reference Model defined in the BDV Strategic Research and Innovation Agenda (BDV SRIA) [ 1 ] and contributes to the model in the following ways. Specifically, the work is relevant to the following BDV Reference Model horizontal concerns:

Data visualisation and user interaction : We develop several advanced and interactive visualisation solutions applicable in the banking sector, as illustrated in Figs. 3 , 5 , and 9 .

Data analytics : We develop data analytics solutions for the three industrial use cases in the banking sector, as described in Sects. 4.1 – 4.3 . While the solutions may not correspond to state-of-the-art advances in algorithm development, they clearly contribute to revealing novel insights into how Big Data analytics can improve banking operations.

Data processing architectures : We develop an architecture as shown in Fig. 1 that is well suited for banking applications where both batch analytics (e.g., analysing historical data) and streaming analytics (e.g., online processing of new transactions) are required. A novelty of the architecture is the incorporation of realistic synthetic data fabrication and the definition of scenarios of usefulness and quality assurance of the corresponding synthetic data.

Data protection : We describe in Sect. 5 how data tokenisation and realistic synthetic data fabrication can be used in baking applications to allow for more agile development of Big Data analytics solutions.

Data management : We present innovative ways for data management utilising efficient multidimensional indexing, as described in Sect. 4.3 .

Regarding the BDV Reference Model vertical concerns, the work is relevant to the following:

Big Data Types and Semantics : The work is mostly concerned with structured data, meta-data and graph data. The work contributes to a generation of realistic synthetic data from the corresponding domain-defined meta-data.

Cybersecurity : The presented solutions that include data tokenisation correspond to novel best practice examples for securely sharing sensitive banking data outside bank premises.

Therefore, in relation to BDV SRIA, we contribute to the following technical priorities: data protection, data processing architectures, data analytics, data visualisation and user interaction.

Finally, the chapter relates to the following cross-sectorial technology enablers of the AI, Data and Robotics Strategic Research, Innovation and Deployment Agenda [ 18 ], namely: Knowledge and Learning, Reasoning and Decision Making, and Systems, Methodologies, Hardware and Tools.

5 Lessons Learned, Guidelines and Recommendations

CaixaBank, as many entities in critical sectors, was initially very reluctant to use any Big Data storage or tool outside its premises. To overcome that barrier, the main goal of CaixaBank when enrolling in the I-BiDaaS project was to find an efficient way to perform Big Data analytics outside its premises, which would speed up the process of granting new external providers to access CaixaBank data (which usually encompasses a bureaucratic process that takes weeks or even a month). Additionally, CaixaBank wanted to be much more flexible in the generation of proof-of-concept (PoC) developments (i.e. to test the performance of new data analytics technologies to be integrated into its infrastructure). Usually, for any new technology testing, even a small test, if any hardware is needed to be arranged, it should be done through the infrastructure management subsidiary who will finally deploy it. Due to the size and level of complexity of the whole CaixaBank infrastructure and rigid security assessment processes, its deployment can take months.

For those reasons, CaixaBank wanted to find ways to bypass these processes without compromising the security of the entity and the privacy of its clients. General Data Protection Regulation (GDPR) Footnote 8 really limits the usage of the bank customers’ data, even if it is used for potential fraud detection and prevention and for enhancing the security of its customers’ accounts. It can be used internally to apply certain security policies, but how to share this data with other stakeholders is still an issue. Furthermore, bank sector is strictly regulated, and National and European regulators are supervising all the security measures taken by the bank in order to provide a good level of security for the entity and, at the same time, maintain the privacy of the customers at all times. The current trend of externalising many services to the cloud also implies establishing a strict control of the location of the data and who has access to it for each migrated service.

The I-BiDaaS CaixaBank roadmap (Fig. 11 ) had a turning point, in which the entity completely changed its approach from a non-sharing real data position to looking for the best way possible to share real data and perform Big Data analytics outside its facilities. I-BiDaaS helped to push for internal changes in policies and processes and evaluate tokenisation processes as an enterprise standard to extract data outside their premises, breaking both internal and external data silos.

figure 11

CaixaBank roadmap in I-BiDaaS project

Results obtained from the first use case validated the usage of rule-based synthetically generated data and indicated that it can be very useful in accelerating the onboarding process of new data analytics providers (consultancy companies and tools). CaixaBank validated that it could be used as high-quality testing data outside CaixaBank premises for testing new technologies and PoC developments, streamlining grant accesses of new, external providers to these developments, and thus reducing the time of accessing data from an average of 6 days to 1.5 days. This analysis was beneficial for CaixaBank purposes, but it was also concluded that the analysis of rule-based fabricated data did not enable the extraction of new insights from the generated dataset, simply the models and rules used to generate the data.

The other two use cases focused on how extremely sensitive data can be tokenised to extract real data for its usage outside CaixaBank premises. By tokenising, we mean encrypting the data and keeping the encryption keys in a secure data store that will always reside in CaixaBank facilities. This approach implied that the data analysis will always be done with the encrypted data, and it can still limit the results of the analysis. One of the challenges of this approach is to find ways to encrypt the data in a way that it loses as little relevant information as possible. Use case 2 and use case 3 experimentation was performed with tokenised datasets built by means of three different data encryption algorithms: (1) format-preserving encryption for categorical fields, (2) order-preserving encryption for numerical fields and (3) a bloom-filtering encryption process for free text fields. This enabled CaixaBank to extract the dataset, upload it to I-BiDaaS self-service Big Data analytics platform and analyse it with the help of external entities without being limited by the corporate tools available inside CaixaBank facilities. I-BiDaaS Beneficiaries proceeded with an unsupervised anomaly detection in those use cases, identifying a set of pattern anomalies that were further checked by CaixaBank’s Security Operation Center (SOC), helping to increase the level of financial security of CaixaBank. However, beyond that, we consider this experimentation very beneficial, and should be replicated in other commercial Big Data analytics tools, prior to their acquisition.

The main benefits obtained by CaixaBank due its participation in I-BiDaaS (highlighted in Table 2 ) directly relate to the evaluation of the different requirements presented in Sect. 2 (Table 1 ).

We were able to speed up the implementation of Big Data analytics applications (R1), test algorithms outside CaixaBank premises (R2) and test new tools and algorithms without data privacy concerns by exploring and validating the usage of synthetic data and tokenised data (R3) in three different use cases, improving the efficiency in time and cost (R5, R6, R7) by means of skipping some data access procedures and being able to use new tools and algorithms in a much more agile way. User requirements regarding the availability of ‘Intermediate and Non-IT users’ to analyse and process the data of the use cases were also validated through several internal and external workshops Footnote 9 in which the attendees from several departments of CaixaBank and other external entities (data scientists, business consultants, IT and Big Data managers) provided very positive feedback about the platform usability. Moreover, use cases 2 and 3, as mentioned previously, were also validated by the corresponding business processes employees, being able to extract the results by themselves.

Last but not least, it is important to highlight that those results should be applicable to any other financial entity that faces the same challenges and tries to overcome the limitations of data privacy regulation, the common lack of agility of large-scale on-premise Big Data infrastructures and very rigid but necessary security assessment procedures.

6 Conclusion

The digitalisation of the financial sector and the exploitation of the incredible amount of sensitive data collected and generated by the financial entities day by day makes their Big Data infrastructure very difficult to manage and to be agile in integrating innovative solutions. I-BiDaaS integrated platform provided a solutionto manage it in a much more friendly manner and makes Big Data analytics much more accesible to the bank employees with less technical and data science knowledge. It also explored ways to reduce the friction between data privacy regulation and the exploitation of sensitive data for other purposes, showcasing it in several use cases on enhancing an entity’s cybersecurity and preventing fraud toward their clients.





TDF: https://www.ibm.com/il-en/marketplace/infosphere-optim-test-data-fabrication

Hecuba: https://github.com/bsc-dd/hecuba

APAMA: https://www.softwareag.com/corporate/products/apama_webmethods/analytics/overview/default.asp



Zillner, S., Curry, E., Metzger, A., Auer, S., & Seidl, R. (2017). European big data value strategic research & innovation agenda .

Google Scholar  

Passlick, J., Lebek, B., & Breitner, M. H. (2017). A self-service supporting business intelligence and big data analytics architecture. In 13th International Conference on Wirtschaftsinformatik , St. Gallen, February 12–15, 2017.

Arruda, B. D. (2018). Requirements engineering in the context of big data applications. SIG-SOFT Software Engineering Notes, 43 (1), 1–6.

Horkoff, J. (2019). Goal-oriented requirements engineering: An extended systematic mapping study. Requirements Engineering, 24 , 133–160.

CrossRef   Google Scholar  

NIST Big Data Public Working Group: Use Cases Requirements Subgroup: National Institute of Standards and Technology (NIST). Big Data Interoperability Framework: Volume 3, Use Cases and General Requirements. Technical report, National Institute of Standards and Technology, Special Publication 1500-3. 2015.

I-BiDaaS Consortium. (2018). D1.3: Positioning of I-BiDaaS. https://doi.org/10.5281/zenodo.4088297

Arapakis, I., et al. (2019). Towards specification of a software architecture for cross-sectoral big data applications. 2019 IEEE World Congress on Services (SERVICES) (Vol. 2642). IEEE.

Badia, R. M., Conejero, J., Diaz, C., Ejarque, J., Lezzi, D., Lordan, F., & Sirvent, R. (2015). Comp superscalar, an interoperable programming framework. SoftwareX, 3 , 32–36.

Cugnasco, C., Calmet, H., Santamaria, P., Sirvent, R., Eguzkitza, A. B., Houzeaux, G., Becerra, Y., Torres, J., & Labarta, J. (2019). The OTree: Multidimensional indexing with efficient data sampling for HPC. In 2019 IEEE International Conference on Big Data (pp. 433–440). IEEE. 2019.

Bock, H.-H. (2017). Clustering methods: A history of k-means algorithms. Selected contributions in data analysis and classification (pp. 161–172). Berlin: Springer.

Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd (Vol. 96, No. 34, pp. 226–231).

Álvarez Cid-Fuentes, J., Solà, S., Álvarez, P., Castro-Ginard, A., & Badia, R. M. (2019). dislib: Large scale high performance machine learning in python. In Proceedings of the 15th International Conference on eScience, 2019 (pp. 96–105).

Hinton, G. E., & Roweis, S. (2002). Stochastic neighbour embedding. Advances in Neural Information Processing Systems, 15 , 857–864.

Jimenez, S., Becerra, C., & Gelbukh, A. (2012). Soft cardinality: A parameterized similarity function for text comparison”. In SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012) (pp. 449–453).

Pearson, K. (1991). On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2 (11), 559–572.

Okada, S., Ohzeki, M., & Taguchi, S. (2019). Efficient partition of integer optimization problems with one-hot encoding. Scientific Reports, 9 (1), 1–12.

Huang, Z. (1998). Extensions to the k-modes algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 2 (3), 283–304.

Zillner, S., Bisset, D., Milano, M., Curry, E., García Robles, A., Hahn, T., Irgens, M., Lafrenz, R., Liepert, B., O’Sullivan, B., & Smeulders, A. (Eds.). (2020). Strategic research, innovation and deployment agenda – AI, data and robotics partnership . Third Release. September 2020.

Download references


The work presented in this chapter is supported by the I-BiDaaS project, funded by the European Union’s Horizon 2020 research and innovation programme under Grant Agreement 780787. This publication reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

Author information

Authors and affiliations.

Aegis IT Research LTD, London, UK

Andreas Alexopoulos, Spiros Fotis, Leonidas Kallipolitis & Ilias Spais

Barcelona Supercomputing Center, Barcelona, Spain

Yolanda Becerra, Cesare Cugnasco, Miquel Martínez & Raül Sirvent

IBM, Haifa, Israel

Omer Boehm & Michael Vinov

Information Technology for Market Leadership, Athens, Greece

George Bravos & Vasilis Chatzigiannakis

Ecole des Ponts ParisTech, Champs-sur-Marne, France

Giorgos Demetriou, Vlatka Katusic & Hernan Ruiz-Ocampo

University of Manchester, Manchester, UK

Iliada Eleftheriou, Evangelia Kavakli & Rizos Sakellariou

Faculty of Sciences, University of Novi Sad, Novi Sad, Serbia

Lidija Fodor, Dusan Jakovetic, Nemanja Milosevic & Srdjan Skrbic

Technical University of Crete – School of Electrical and Computer Engineering, Crete, Greece

Sotiris Ioannidis

Foundation for Research and Technology, Hellas – Institute of Computer Science, Crete, Greece

Sotiris Ioannidis, Despina Kopanaki, Christoforos Leventis & Giorgos Vasiliadis

CaixaBank, Valencia, Spain

Mario Maawad Marcos & Ramon Martin de Pozuelo

ATOS, Madrid, Spain

Enric Pere Pages Montanera

Software AG, Darmstadt, Germany

Gerald Ristow

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Ramon Martin de Pozuelo .

Editor information

Editors and affiliations.

Insight SFI Research Centre for Data Analytics, NUI Galway, Ireland

Dr. Edward Curry

Information Centre for Science and Technology, Leibniz University Hannover, Hannover, Germany

Prof. Dr. Sören Auer

SINTEF Digital, Oslo, Norway

Dr. Arne J. Berre

Paluno, University of Duisburg-Essen, Essen, Germany

Dr. Andreas Metzger

Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain

Prof. Maria S. Perez

Siemens Corporate Technology, München, Germany

Prof. Sonja Zillner

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and Permissions

Copyright information

© 2022 The Author(s)

About this chapter

Cite this chapter.

Alexopoulos, A. et al. (2022). Big Data Analytics in the Banking Sector: Guidelines and Lessons Learned from the CaixaBank Case. In: Curry, E., Auer, S., Berre, A.J., Metzger, A., Perez, M.S., Zillner, S. (eds) Technologies and Applications for Big Data Value . Springer, Cham. https://doi.org/10.1007/978-3-030-78307-5_13

Download citation

DOI : https://doi.org/10.1007/978-3-030-78307-5_13

Published : 29 April 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-78306-8

Online ISBN : 978-3-030-78307-5

eBook Packages : Computer Science Computer Science (R0)

Share this chapter

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us


  1. 44+ Data Analytics Advantages Background

    case study big data analytics in banking sector

  2. 2013 11 23 10 37 Big Data Analytics in Banking

    case study big data analytics in banking sector

  3. Role of Business Analytics in Digital Banking

    case study big data analytics in banking sector

  4. 6 Applications of Data Analytics in Banking

    case study big data analytics in banking sector

  5. Informative Report on Big Data Analytics in Banking Market

    case study big data analytics in banking sector

  6. Data Analytics Case Study on Optimizing Bookings for Uber

    case study big data analytics in banking sector


  1. Career in Data Tip: Learn About Your #Data


  3. Data_Analyst_2303_Class-01 (05-05-23)


  5. Generative AI in Banking _ Part III _ Case Study on ICICI Bank Customer care

  6. Data Science


  1. What Is Business Analytics?

    Business Analytics (BA) is the study of an organization’s data through iterative, statistical and operational methods. The process analyses data and provides insights into a company’s performance and expected results through predictive mode...

  2. Why Every Business Needs IoT Analytics Software: Key Benefits and Use Cases

    In today’s digital age, the Internet of Things (IoT) has become increasingly prevalent in various industries. With the ability to connect and collect data from a wide range of devices and sensors, businesses now have access to vast amounts ...

  3. What Are the Best Data Analytics Companies in 2022?

    As our world becomes increasingly connected, there’s no denying we live in an age of analytics. Big Data empowers businesses of all sizes to make critical decisions at earlier stages than ever before, ensuring the use of data analytics only...

  4. Big Data in Banking

    The United Overseas Bank (UOB) Limited, the third-largest bank in SouthEast Asia, has leveraged Big Data to

  5. Impact of Big Data Analytics on Banking: A Case Study

    This strategy meant that 90% of the revenues in the personal finance sector were contributed by only 10% of customers; this is significantly

  6. 5 Big Data Use Cases in Banking and Financial Services

    Big data analysis is helping them to know about the details like demographic details, transaction details, personal behavior, etc. Based on these data, banks

  7. A Case Study on Big Banks: Integrating Big Data for Success

    Data governance and management can be combined with analytics platforms to unlock new and innovative business solutions. Cost. Reduction. Streamline business

  8. Impact of Big Data Analytics on Banking Sector

    IVR analysis. B2B merchant insights. Real time capital calculations Log analytics. 2. Use case from Banking Sector - Problem statement and available data. The

  9. (PDF) Banking on big data: A case study

    The big data, Peta-byte, can be efficiently used to analyze the financial behavior of a customer. A customer, who would have defaulted on a loan

  10. Big Data in the Banking Industry: The Main Challenges and Use

    8 big data use cases in banking · 1. Personalized customer experience · 2. User segmentation and targeting · 3. Business process optimization and

  11. The Impact of Big Data Analytics on the Banking Industry

    PDF | Big data has made a significant impact in many sectors of the U.S. and world economies like healthcare, manufacturing and retail. Ac-cording to a.

  12. A Study to analyze the effectiveness of using Big Data Analytics in

    Banks like HDFC and ICICI Banks had started investing in big data analytics since 2004. They built up data warehouses and also invested in cutting edge data

  13. Elucidation of big data analytics in banking : a four-stage Delphi study

    According to the IDC institute's report in. 2016, the banking sector has invested $20.8 billion on big data, indicating that banks are one of the dominant

  14. Big Data Analytics in the Banking Sector: Guidelines and Lessons

    The second CaixaBank use case that was studied in I-BiDaaS is advanced analysis of bank transfer payment in financial terminals. This use case