What is data value?
The value of data is easy to see, right? For us data nerds, the answer is an emphatic yes. While this may be true, it is not always the case for everyone within your organization. Data is used to understand and improve business processes. This increases operational efficiency. Ultimately, it drives revenue, saves time and saves money. How much more clear could the business value be?
Unfortunately, there are many factors that contribute to the perceived and actual effectiveness of data within organizations. In a statista report that covers the leading challenges of using data to drive business value, we see a wide variety of results. These can be categorized into two main buckets, which will be discussed in depth:
- Data challenges
- Organizational/cultural challenges
Outside of these challenges, organizations are also tasked with splitting their time on data defense and data offense. Both are necessary for high data performance. Both are also going to impact the way that organizations see the value of their data.
There is a lot at stake here. The data value problem is complex, with multiple different dimensions - from a growing necessity to protect data privacy, to the impact this protection has on data quality. It is also a problem born from the unlimited potential impact that data can have on an organization's profits, innovation, efficiency, and additional revenue streams. Changes to data management made at the top, are going to have major implications downstream. Prioritization of data strategies/goals is going to be necessary. This is an important conversation that is being had by many organizations around the world, and is one that MOSTLY AI is dedicated to being a part of.
So… where does this conversation begin? Below are the questions, insecurities, and important topics we hear most often from clients and prospects.
Data defense: the foundation of responsible and effective data usage
Data defense is a strategic approach to data management that aims to “ensure data security, privacy, integrity, quality, regulatory compliance, and governance”. This is a huge umbrella that encompasses multiple business functions and c-level positions. It also reflects the emphasis that organizations are placing on a strong defensive strategy. This emphasis comes from two main sources - higher levels of risk with data breaches, and increasing amounts of data privacy protection legislation. In 2019, the average total cost of a data breach was $3.92 million USD. This cost includes liability for the damages - more specifically, the cost of patching the vulnerability, compensating victims for damages, and expenses related to litigation. It also includes costs associated with the negative impacts on customer trust and brand reputation. This is a massive potential hit, which is why such great measures are being taken to avoid it. Finally, according to Gartner, over 75% of the world's population will have its personal data covered under modern privacy regulations. The necessity of strong data defense today is only going to be heightened as data breaches continue to grow and evolve, and as legislation expands around the globe to protect personal data.
The complexity of data defense brings with it many challenges in demonstrating the business value of data. The data challenges include data quality issues, managing compliance, and not knowing what data exists. The organizational challenges include data access, data sharing capabilities, and data cohesion across multiple different functions. These challenges manifest themselves as regulatory violations, a lack of performance in data driven projects due to data quality issues, lengthy wait times on data which stalls progress on these projects, disorganization internally, and frustration felt at all levels within the organization.
Clearly many organizations, especially those in industries that contain high levels of PII/PHI (Personal Identifiable Information/Personal Health Information), are going to be spending a lot of time and resources on data defense. This time, energy, money, and employee allocation will need to clearly indicate the business value being added in this space. In order to do so, answers to the challenges mentioned above have to exist.
While there is no one answer to solve these problems, MOSTLY AI is working to drive business value while simultaneously easing the burden of the data/organizational challenges that are prevalent inside so many organizations.
The most common data challenges
Data challenges are felt across many different job functions that handle the data stored internally. In order to manage data compliance, privacy protection is an absolute necessity. Right now, “more than 70% of employees have access to data they should not”. Traditional methods of anonymization which are commonly used (data masking, pseudonymization, permutation, randomization, and generalization) are not fully effective in guaranteeing privacy protection, and also have major negative implications on the quality of the data afterwards. This has an impact on the downstream tasks that rely on the integrity of the original data in order to perform up to expectations.
MOSTLY AI's synthetic data generator allows for a safer, smarter, and faster way of protecting sensitive data. While guaranteeing zero risk of re-identification, our synthetic data is also able to maintain a very high degree of accuracy of the original data which it is created from. This means that the data looks, feels, and behaves the exact same way as the original data, without any risk of compliance violations. It means that synthetic data used downstream will meet/exceed expectations.
Organizational challenges across industries
Organizational challenges are also felt amongst the entirety of the organization that works with data. According to Harvard Business Review, “Cross-industry studies show that, on average, less than half of an organization’s structured data is actively used in making decisions”. This speaks to the highly prevalent issue that organizations face in accessing data. Some of the contributing factors to this include lengthy approval processes for data sharing internally (due to data sensitivity), and a lack of understanding of what is in the data/what data is relevant to whom (due to data volume). Another big challenge here are the organizational silos which contribute to a lack of data cohesion.
While there are many different potential solutions, it is essential that synthetic data is relied upon to ease the burden of these problems. At MOSTLY AI, one of our main use cases is enhanced data sharing capabilities. Due to the fact that synthetic data is completely privacy safe, the approval processes for sharing data internally are vastly simplified. This not only allows everyone within the organization to access the same high quality data, but it allows them to do so quicker. Having fully accessible data also will have a massive impact on the ability for organizations to become cohesive in their data strategy and utilization. MOSTLY AI also allows users to downsample in the synthetic generation, therefore creating a dataset that is not muddied by high levels of irrelevant data.
How to demonstrate data value?
Demonstrating valueis possible within data defense. Once the burden of the challenges that make this difficult are eased, there is plenty of opportunity to add value. One of these ways is through increased operational efficiency, using MOSTLY AI generated synthetic data. As a result of more uniform, fully privacy safe data, accessing it will no longer take weeks to months. This data access does not only apply to the business unit or team that it originated in, but across the entire organization. Take, for example, Telefónica - a customer of ours who had large amounts of data locked away for use by the analytics team. After utilizing MOSTLY AI synthetic data generation, millions of records were able to be used in a GDPR-compliant way to power their analytics and AI projects. Examples like this carry so much power. This means that projects will not be stalled due to lack of relevant data or long wait times. Decreased time to data, and the implications of using this “real-time” data downstream, is something that can be measured and clearly demonstrated to the highest levels of leadership within the organization.
Value can also be seen in the reduction of risk associated with using synthetic data. How many of your downstream projects contain original production data? Looking at the number of data compliance violations (and the costs associated with them) that occurred both before and after utilizing synthetic data, will paint a clear picture for the value of your data from a risk reduction standpoint.
Another defensive activity is in protecting your organization against fraud, money laundering, theft, or other anomaly scenarios. Building analytical models to detect and inform your organization based on warning signals within customer data can result in significant cost savings, and credibility maintenance with your customers. These models help reduce false positives and detect new fraud/anomaly cases, but their performance is highly dependent on the quality of the training data. Using MOSTLY AI generated synthetic data - our customers can upsample fraud patterns in order to boost machine learning performance and decrease false positives. More simply - our customers can trust that synthetic data will enhance fraud/anomaly model performance. This is another way in which it is very possible to demonstrate the value of data while on defense.
Data offense: the driver of revenue, profitability, and customer satisfaction
When thinking about sports, everyone loves watching their team while they are on offense. It is where the magic happens, where the points are scored, and where they are able to get creative. This is also where data leadership within organizations gets excited. Data offense is absolutely essential for organizations in standing out from competition. It also plays a big role in allowing for growth, and making shareholders happy. As a result, there are a lot of eyes on the way organizations are using data offensively. On top of that, there are also growing expectations surrounding the business value that data can add.
Many of the challenges that organizations face in proving the business value of data that are seen in data defense can also be seen in data offense. If the challenges were not dealt with during defensive activities, organizations will have to deal with lack of high data quality, difficulty accessing relevant data, and low operational efficiency in their offensive efforts. These issues will impact the perceived value of an organization's data, but will also result in a loss of revenue/profit/customer satisfaction that could have been generated without them. This emphasizes the need to address these challenges, and to do so quickly.
Driving offense with synthetic data
Offense strategies consist of using data to support the business objectives of increased revenues, customer satisfaction, and profits. Each of these objectives requires rich analytics and smart/accurate models. As discussed previously, the success of these projects will depend on the data that is informing and empowering them. You could use unprotected, original data that carries major compliance violations with it. You could also use data that has lost referential integrity after the time consuming process of manual anonymization, masking, generalization, pseudonymization, etc. Unfortunately both of these strategies require compromise - you are either losing data quality (which is so vitally important to build smart/accurate models), or you can turn a blind eye to the privacy protection legislation that is only growing more prevalent and restricting.
MOSTLY AI recognizes that data can be an organization’s greatest asset. We also recognize that certain things shouldn’t be compromised when using data. Synthetic data allows for smart, safe, and powerful data usage by ensuring privacy protection without loss in data quality. Below are brief summaries of how organizations are currently using synthetic data to increase revenue, customer satisfaction, and profits.
Increase of revenue, customer satisfaction, and profits using synthetic data
Driving revenue with data has many different forms. It could be utilizing the data for marketing/sales insights that lead to increased customer acquisition, or the monetization of data. Our CEO, Tobi Hann, discusses a particular data monetization use case which highlights our unique data sharing capabilities to drive revenue.
Synthetic data also enables product teams to work with customer data that looks, feels, and behaves the exact same way as the original data, but in a privacy safe way. This results in the creation and deployment of highly personalized products and services for the end user. This can result in KPI’s that are very demonstrative of the value of the data in the form of lower customer churn and increased numbers of new customers won (i.e. higher customer satisfaction). More happy customers means more revenue.
Profitability is a goal that every organization shares. Higher profits can be achieved by analyzing data to optimize pricing which balances risk and profitability. Finding a maximum price point that does not drive customers away has to be done using high quality customer data. It also has to be done using highly accurate and smart models. Feeding these models with data that can be rebalanced, augmented, or downsampled, based on the scenario, enables increased performance. Higher profits, derived from data, will clearly demonstrate the value of that organization's data.
|Original or Traditionally Anonymized Data||AI-generated Synthetic Data|
|Data Defense||Many potential compliance violations|
Long wait times for data access
Data quality issues for downstream usage, after protection
|No risk of compliance violations|
Data consistency across multiple business units
Ability to enhance model performance for anomaly situations (fraud, laundering, theft, etc)
|Data Offense||Either using non-protected or slightly dated data for offensive activities|
Low data quality has major implications (monetarily) on performance of offensive activities such as pricing models, or drawing insights from sales/marketing activities
|Fully privacy safe and “real-time” data to enhance offensive activities|
Fully representative data with no decrease in data quality
Customizable data in order to increase model performance based on the situation
Don't compromise on value
Right now, organizations are having to compromise on either data quality or data privacy protection. The data quality is a must have, in order to enhance all downstream tasks. The data privacy protection is a must have in order to stay compliant in a world that is committed to protecting personal data.
MOSTLY AI generated synthetic data allows for data usage without compromise.
It is a tool that helps solve challenges organizations face in demonstrating the business value of their data.
It is a tool that helps that value shine even brighter.
It is a tool that can and should be used in both offensive and defensive data strategies.
It is a technological solution that encourages and facilitates a positive change in data culture and performance.
It is something that I would love to talk with you about, if there is any interest or curiosity. Feel free to send me an email at firstname.lastname@example.org, and we can find time to chat.