About MOSTLY AI Archives

What is synthetic data?

In this post, I will review the landscape of synthetic data companies. But first, what is synthetic data? Synthetic data is an artificial version of your real data created algorithmically. It looks and feels like real data and can be used for the same purposes. Synthetic data should not be confused with mock data; it retains the structure and statistical properties (including correlations) of your real data.

Why have synthetic data companies emerged recently?

Several factors are impacting the synthetic data landscape:

The increasing demand for artificial intelligence (AI) applications that require large and diverse datasets for training and validation.
Growing awareness of ethical and legal issues associated with real data, such as privacy, consent, and bias.
The development of advanced technologies and algorithms capable of generating realistic and high-quality synthetic data.

Until recently, synthetic data was viewed as a substitute or backup for real data. However, with recent advances in generative AI, synthetic data can now match or even surpass the quality of real data. According to Gartner, by 2030, synthetic data will dominate the use of real data in AI models.

What problems do synthetic data companies solve?

Synthetic data companies address various pain points in different industries and use cases, including:

The lack of sufficient or relevant real data for training and testing AI models, especially in complex or rare scenarios.
High costs and time associated with collecting, labeling, and processing real data.
The risk of exposing sensitive or personal information from real data, leading to privacy breaches, legal liabilities, or ethical dilemmas.

By generating synthetic data that closely resembles real data but contains no identifiable information, these companies help overcome these challenges, enabling faster, cost-effective, and safer AI development and deployment.

The world's leading tabular synthetic data generator

MOSTLY AI's free version offers a great, easy way to explore tabular synthetic data generation without having to go through lengthy processes or coding sessions. Check out the world's most accurate synthetic data generator yourself, or book a personalized demo!

Request a personalized demo

Structured vs. unstructured synthetic data

Synthetic data can be structured or unstructured, depending on its type and purpose:

Structured data has a defined format and clear relationships between data points, typically stored in tabular form, such as in Excel files or SQL databases.
Unstructured data lacks a predefined structure, making it more challenging to analyze using traditional methods. Examples include images, videos, transcripts, and emails.

Use cases for structured synthetic data

Structured synthetic data finds applications for example in:

Use cases for unstructured synthetic data

Unstructured synthetic data is used for example in:

Natural language processing (NLP) tasks.
Computer vision tasks.
Clinical decision support systems.

Funding for synthetic data companies

Here is a list of structured and unstructured synthetic data companies along with their funding:

Structured synthetic data companies

#	Name	Funding in Mio $
1	Accelario	15.6
2	AiDrome
3	Betterdata	2.4
4	Clearbox AI	0.79
5	CloudTDMS
6	CNAI
7	Curiosity
8	Datacebo
9	DataCo
10	Datomize	6
11	Datamaker	0.073
12	ExactData
13	Facteus	15.1
14	Fairgen	2.5
15	FinCrime Dynamics	0.758
16	Gretel.ai	67.7
17	HAZY	14.8
18	Howso (formerly Diveplane)	34
19	Kymera Labs	0.15
20	K2view
21	MDClone	104
22	Mirry.ai
23	MOSTLY AI	31.1
24	Octopize MD	1.5
25	Replica analytics	1
26	Sarus	2.17
27	Statice
28	Syndata	0.245
29	Syntheticus
30	Synthesized	2.8
31	Syntho	1.22
32	Tonic.AI	45
33	Truata	0.05
34	Veil.ai	1.41
35	Ydata	3.2
	Total funding in Mio $	370.2

Funding for structured synthetic data companies

Unstructured synthetic data companies

#	Name	Funding in Mio $
1	AI Reverie	5.8
2	Anyverse	0.972
3	Bifrost	3.5
4	CVEDIA	-
5	Coohom Cloud	-
6	Datagen	-
7	Dazzle AI	-
8	Deep Vision Data	-
9	EdgeCase	-
10	Elevenlabs	21
11	Kroop AI	0.034
12	Lexset	1
13	Midjourney	-
14	Mindtech	10.1
15	Neurolabs	4.9
16	Parallel domain	43.9
17	Rendered AI	6
18	Scale Synthetic	-
19	Sky Engine	2
20	Synthesis AI	21.5
21	Synthetik	1.9
22	Vypno	-
23	Zumo Labs	0.15
	Total funding in Mio $	121.1

Funding for unstructured synthetic data companies

All synthetic data companies — Synthetic data companies in the structured and unstructured synthetic data space

Acquisitions of synthetic data companies

As of August 2023, there have been four publicly-known acquisitions:

#	Name	Acquired by	When
1	AI.Reverie	Facebook	2021
2	Replica Analytics	Aetion	2022
3	Statice	Anonos	2022
4	Logiq.ai	Apica	2023

Acquisitions of synthetic data companies as of 2023

The future of synthetic data companies

As AI technologies advance, the role of synthetic data in AI development will evolve. Synthetic data companies have a promising future driven by the increasing demand for high-quality data. To ensure the quality and compliance of synthetic data, companies must refine their data synthesis methods and address challenges related to privacy, diversity, and cost-effectiveness.

The upcoming AI Act in Europe highlights the importance of synthetic data in AI development, particularly in addressing data privacy and quality issues. Synthetic data companies are poised to play a crucial role in this regulatory landscape.

The world's leading tabular synthetic data generator

Request a personalized demo

Conclusion

Synthetic data companies have ushered in an era of responsible and ethical AI development. They have addressed data scarcity, privacy concerns, and model bias, offering a reliable and privacy-conscious alternative to mock data. As AI continues to advance, synthetic data's potential to mitigate bias, enhance model robustness, and reduce costs will become increasingly indispensable.

In a world of tightening data privacy regulations, synthetic data will continue to ensure ethical and legal AI development. The promises and possibilities ahead are boundless as we embrace a data-driven future.

If you need to synthesize structured synthetic data, check out MOSTLY AI's synthetic data generator!

Back in 2017, we set out to enable organisations to thrive ethically and responsibly with smart and safe synthetic data. Since then, we have been building our product, whilst also dedicating a significant part of our time to creating blog posts, podcasts, and tutorials that enable our users to make the most of it. After all, "a tool is only as good as its users." To say that our messaging, the format of our messages, and the senders of these messages are diverse would be an understatement.

On Monday, we learn about data simulation from our AI & Machine Learning Product Owner.

On Tuesday, we hear about insurance innovation by our Head of Customer Experience.

By Wednesday, we are excited about synthetic data use cases beyond privacy.

Come Friday, our Legal Counsel will guide you through the EU's AI Act , how it was born, and how it will come into effect.

Our content is like a ~~cake~~ salad: so ~~delicious~~ much variety!, so many possibilities of customization!, so rich! Just pick the ingredient(s)/topic(s) you like the most and indulge yourself—no guilty pleasures attached.

To say that we were proud of our product and the content surrounding it would be an understatement. That was until we realized something was missing. 🤦🏽‍♂️

While we were making a tremendous amount of information available to everyone in the company through our open feedback culture, monthly OKRs, and reporting, we noticed that when it came to speaking to our community, we were only showcasing the final result, our product. We weren't publicly sharing — which is not the same as hiding — the backbone that defines us: our culture.

It’s not just about us 👋🏽

MOSTLY AI is more than just the fearless folks researching, designing, coding, promoting, and selling it. It also belongs to all those who believe data should empower all data consumers to build a smarter and fairer future together for the benefit of everyone. If we want Sarah (a Data Science student in Nigeria), Robin (an Innovation Champion in a Canadian bank), and Sam and Ezra (Machine Learning Engineers in American and Dutch enterprises) to be part of this community and help us democratize access to data, we think it's only fair that you get to know us better.

Plus, we're growing our team. Just as we appreciate the chance to review a CV and prepare in advance, we believe applicants should have the same opportunity. We recognize that applying for or accepting a new job is a leap of faith, but we aim to make future "Mostlies" feel comfortable and excited to apply. They should understand more about our company and culture beyond just the job description.

While we can't really take you on a date—that sort of relationship is better suited for Joaquin Phoenix in "Her"—we've decided to make our employee handbook public and accessible to all. Dive in and explore our core values, our approach to remote work, communication methods, perks, commitment to diversity, and everything that defines MOSTLY AI. It's our way of promoting openness and trust within our community, and we are as excited about it as a traveler is about exploring a new destination.

Note: While we advocate for transparency, we do protect information when legally required or when it's not entirely ours to disclose.

But this journey is also about us and who we want to be 🔜

Our handbook doesn't just define who we are now; it outlines who we aspire to be in the future. We believe in the power of transparency as a means to build trust, hold ourselves accountable, and propel our community forward. We don't have hundreds and thousands of wordy pages of policies and procedures that nobody reads. Instead, we have straightforward guidelines that Mostlies can use as a starting point and to complement one of our beloved principles: use common sense and make a fair/good judgment.

A public handbook not only reflects our identity but also helps solidify our aspirations. Being editable, everyone can contribute, enriching the document with their experiences.

Having a public handbook also gives us a chance to reflect on who we actually are as a company, what we stand for, and how we should work. Writing about it in the handbook makes these aspirations more tangible and, equally important, editable because everyone is able to contribute to this living, breathing document with their experiences.

We love synthetic, not static 👷🏽

As there’s always room for improvement (except for dogs, who are perfect as they are), we are comfortable with its imperfections because that means we can make it better together. And by that, we don’t mean updating misspellings or reflecting team changes in the org chart, but actually:

From employees:
- pointing out when we write that we are about ABC, but in reality we're doing XYZ;
- flagging that something is assuming an ideal-case scenario rather than the real-world;
- noticing that something important is missing;
From employees and everyone in our community:
- simply bringing in new ideas. Insights from different perspectives can help us identify blind spots, areas for improvement, and innovative solutions that might not have surfaced otherwise.

As we embark on this journey, we are excited about the potential benefits it holds for MOSTLY AI, our Mostlies, and the community at large. So please go ahead and explore it, soak in the startup spirit, and if you have any burning questions or just want to say "hi," please do drop us a line. We welcome and cherish your feedback!

Our employee handbook is a dashboard for everyone interested in seeing everything that makes MOSTLY AI tick. Do you want to take a peek inside? Come in, we are open.

As a part of our Diversity, Equity, Inclusion, and Belonging (DEI&B) efforts, we just broke the less is more rule and added one new cultural value, Be YOU 💎, to our beloved set of core values at MOSTLY AI. According to Benton and Bradford Consulting, authenticity at work is the opposite of conformity.

While all our ‘Mostlies’ share some personality traits - for instance we’re all fearless and we have a fair split between Crocs lovers and haters - our team is committed to creating an environment where uniqueness is not just accepted, but rather celebrated.

But what exactly does it mean to “Be YOU”... can someone be rude in the morning because that’s just how they are? No, not really! Weird habits or odd quirks are okay, but ultimately what we really want is for everyone to feel safe enough to work with authenticity, and bring their originality to their role, their team, and our business.

As aptly put in this article, “Be yourself! But not if your default status is jerk. If expressing your true self means acting in opposition to your company’s commitment to corporate values and diversity, inclusion, and equity (DEI), then that part of your persona needs to stay home.” Authenticity aligned with organizational values is the sweet spot in the workplace.

Honesty is a key part of authenticity at work, however, there's a distinct difference between being brutally honest and being truthful with others. If you are interested in this topic, in the book "The Dance of Deception" psychologist Dr. Harriet Lerner distinguishes between these two concepts, arguing that honesty can sometimes represent our uncensored ideas and feelings, whereas truth requires tact, timing and empathy with the other person.

Now that we have cleared what it does not mean, let’s talk about what “Be YOU” stands for, in the context of our work and careers, at MOSTLY AI.

Authenticity at work means embracing who we truly are

At MOSTLY AI we give ourselves the freedom to live and work authentically. We don’t want our people to act differently around a manager or colleagues to impress them or to fit in. And we definitely don’t want our people to say what they think others want to hear, especially if this goes against their true nature.

Acting in line with others’ expectations of you can be exhausting, and feel confining, and after a while it’s bad for performance. In turn, when we are authentic and live and work by our values, it can lead us to higher self-confidence and satisfaction. Furthermore, if we’re true to ourselves, others will respect us for standing by our values and beliefs. This fosters cooperation because it builds the strength and openness needed to deal with problems quickly, instead of procrastinating or ignoring them.

Sometimes, when you pay attention to the details, you’ll soon realize if you find yourself in a space where you’re able to express your authentic self. For example, it’s feeling comfortable ordering a green tea when you’re surrounded by a majority of coffee fanatics. Authenticity at work means you feel safe raising your hand in front of the whole company.

Examining and challenging our beliefs

Authenticity at work also means that all of us should make an effort to show sensitivity towards others’ perspectives, views, and concerns. This behaviour promotes empathy, fosters mutual respect, increases understanding and reduces conflict.

“I like people who are not sure of themselves, the perplexed, the modest, those who try to understand.” – Ettore Sottsass

We try really hard not to make any assumptions about others. We let others' actions speak for themselves, and take their words at face value. And one of the best things about being open-minded with others is that they will most likely extend the same courtesy to us. So it’s a win-win situation!

Inspiring others

Together we can create a world where everyone feels comfortable being exactly who they are. We win as a team and by inspiring others we can build strong and supportive teams and communities, where everyone feels included, connected and with a sense of belonging. This is how authenticity at work is ingrained in the culture of a company for the long term.

There’s also a multiplier effect to vouch for. When we inspire others, we motivate them to take action towards achieving their goals. This can help them to overcome obstacles and challenges that may be holding them back. And by inspiring others, we also grow. By sharing experiences and knowledge with others, we are forced to reflect on our own values and beliefs, which can help us to become more self-aware and improve our own lives.

Being ourselves

There's no one else in the world like us, and that's something to celebrate. When we trust ourselves and do what we know to be right, we can live to our full potential. Instead of letting others dictate what's best for us, we take control of our life and career, including our growth and development.

Authenticity at work in a remote-first setting

MOSTLY AI is a remote-first synthetic data company. To help embrace individuality, we’ve created a set of dedicated Slack channels as spaces for us to connect and share personal interests. These include channels for memes, random content, holiday photos, and favorite series / movies. We’ve also collaborated on a global recipe book and a playlist, which both highlight our personal tastes and backgrounds.

In addition, each week we feature a colleague as part of our ‘Get to know’ series. It’s an optional initiative where someone answers 10 or 12 questions from a pool of 30, giving other Mostlies an easy opportunity to get to know their colleague in a different light. Our Marketing team then turns this content into a fun one-pager for everyone to enjoy each Friday.

A step towards radical authenticity at work

By adopting this new value, we are not promising long-term happiness at work, but it's fair to say that it is good to understand our own values and strengths, and then find opportunities in our team where these will make a positive impact and help MOSTLY AI succeed. Diversity makes us stronger and smarter, which ultimately will allow us to deliver the best product.

Spend some time thinking about what makes you authentic, and be brave enough to let these unique attributes shine through at work. You may just surprise yourself, and others. Be YOU, because everyone else is already taken. And remember, it’s about consciousness, making authenticity a daily practice.

"Radical authenticity is a continuous, imperfect and ever-evolving practice of unlearning and letting go of what isn't your truth. It's trusting yourself to live, love, and lead from your heart." - Laura Brunton

Similarly, authenticity at work is always work in progress.

Today we are super excited to share the news that we launched our Phantom Stock for All Program. Unlike in the US or the UK there is no legal framework for employee stock options in Austria. Therefore we had to construct a model that mirrors the behavior of a stock option plan adjusted to the Austrian legal system. The result is our MOSTLY AI Phantom Stock Option Plan (PSOP).

Why did we launch a Phantom Stock for All Program?

Our founders, management team, and investors believe that we need a world class team to achieve our mission of enabling organizations to thrive ethically and responsibly with smart and safe synthetic data and that employee ownership pulls in top talent. Employees are looking for more than just work that pays the bill. They are looking for purpose and a way to change the world and they want to participate in the value they create. Phantom stock can help us along the whole journey of being a member of the MOSTLY AI team.

Hiring Top Talent

A great tool to attract top talent across the globe. While in some regions employees expect stock options, others might not even be aware of the concept. We want to reward every MOSTLY in a meaningful and fair way across the globe. As a remote first company, it is especially critical to create shared interest and to put all our people on the same track, no matter which part of the world they live in. The Phantom Stock for All Program creates a shared space where a global mission becomes a reality.

Keeping People on Board

We know that our talents are among the best in their respective fields and in high demand on the market. Therefore, we want to retain them and let them benefit from their hard work in the long run.

Motivation

Every MOSTLY has a stake in the company and understands and benefits from going the extra mile when needed. Yearly reviews and significant refreshers also reward individual and team performance.

Aligning Goals across Teams

We all share a common goal - the overall, long term company success. This unites the teams and pulls everyone in the same direction. MOSTLIES are fearless (one of our core values) - we explore uncharted territories and if we succeed, we succeed together.

How does MOSTLY AI's Phantom Stock for All Program work?

Granting

We have granted every team member phantom options and will include phantom stock options in the offers for new joiners. This means that all our employees receive rights which entitle them to certain bonus payments by the company calculated on the basis of Phantom Shares in the case of a liquidity event, which can either be an IPO or an acquisition of the company.

Vesting

There are different types of vesting schedules possible, such as backloaded vesting or annual vesting. We opted for the employee-friendly “monthly-vesting” option with a standard one year cliff period. While many programs terminate vesting in case of long term absence (longer than 30 days) we decided to continue vesting up to a 3 months absence - and in case of family leave even up to 12 months.

Exercising

Austrian Tax Law (as many other European laws, for example in Germany) doesn´t know a “Fair Market Valuation” and therefore many companies offer strike prices based on the latest fundraising valuations. We also do that, but offer a considerable discount to make the program even more attractive for all of our team members.

Benefitting from the Option

We are all working towards a high valuation in case of a liquidity event so that everyone will benefit from their hard work and contribution. It’s so simple that we wonder why this isn’t how things are done everywhere.

Mechanism of MOSTLY AI's Phantom Stock — The mechanism of MOSTLY AI's Phantom Stock for All Program

Big Kudos to the team from Index Ventures for pulling together a comprehensive guide for European startups. When designing our PSOP system, we also took inspiration from Balderton Capital’s guide to employee equity.

Interested in shaping the future of synthetic data?

Check out our open roles to see how you can join the MOSTLY AI team.

See open roles

Passionate advocates and fiery opposers have widely discussed the pros and cons of working remotely. At MOSTLY AI, we don't like to live inside boxes, but if we have to be put into one, please make some room for us in the first one and bring us some comfy pillows because we're here for good.

Articles about the impact of remote work on the labor market, environment, real estate, productivity, work-life balance, mental health, and so forth are all over the media, and the humble aim of this blog post is not to add more noise but rather to share our learn on the job experience.

While some companies are channeling their human and financial resources to improving their office facilities to make them suitable for the new work dynamics, for MOSTLY AI, that door is closed, literally and metaphorically speaking. We are now focusing on bigger plans, enabling organizations to thrive ethically and responsibly with smart and safe synthetic data.

Our mission is bigger than the walls of our old office and the borders of our headquarters. The talent we need to deliver our mission might be building a family in the mountains of beautiful Colorado, living solo by the bluish-green waters of the Julian Alps in Slovenia, or livin' la vida loca in Barcelona.

Was the previous paragraph Captain Obvious much? Sorry about that, but I just logged into LinkedIn and saw yet another post asking people to vote on a poll about remote vs. office-based work. A lot of companies still pretend that the office vs. remote debate is still hot; it is so 2021. We are already making the most of remote work, optimizing and working out the fine details.

Another lapalissade that fostered our decision to go fully remote is that with flexibility come inclusivity and diversity, attributes our team and product long for. When was the last time that a group of individuals that share the same background and culture, as well as a similar postcode and local takeaway restaurant, brought something new and disruptive to the table? Take your time; we can wait. It's not like we have to catch a train—we're already at home.

But as someone wise said a long time ago: with great power comes great responsibility. At MOSTLY AI, we are now channeling all our efforts to strengthen our culture and equip our teams, so they can collaborate effectively, all done under the work from anywhere setup. This is easier said than done. While some traditional office-based companies might take culture for granted and let it develop organically, in a remote setup, we have to work hard on it. But how can we do it? That's the million-dollar question without a simple answer. We are still learning, and this is what we are focusing on at the moment:

Documentation and enabling asynchronous communication

Documentation, or rather the lack of it, is often the Achilles' heel of organizations, and we are no exception. Still, we are highly invested in not letting this pain grow further.

We have been paving our way by adding to our handbook and department repositories not only the basics but also elaborate content, for instance, the company's and each department's OKRs and KPIs. In this journey, our documentation is the train that offers visibility >>> which leads to transparency >>> whose final destination is trust.

The MOSTLY AI documentation is our centralized source of truth for how we run our team. It is a living body, subject to changes and updates. Still, one thing remains the same: it always reflects our values of 🧑🏽Excellence,🤝🏾Trust, 🧑🏻‍🤝‍🧑🏿Collaboration, 🤹🏽Flexibility, and 💪🏽 Fearlessness.

Everyone can contribute; these are living, breathing documents. We want everyone to feel informed and empowered, with access to information in an asynchronous way.

But why focus on asynchronous communication? I'm glad you asked that question, dear reader! In-person communication has its benefits, but in a global team with different levels of seniority, working from different latitudes and timezones, and whose synchronous working hours are so precious, it is vital to enable asynchronous communication. We have the tools available, and we are highly invested in using them to infinity and beyond!

Hiring sustainably and carefully onboarding new team members

We had heard about hypergrowth before, but we decided not to do it. We were all about hiring in a sustainable way even before it was cool. Our mission and product are supported by a long-term strategy that relies on hiring new team members, of course, but also supporting the growth, skills, and competencies of our existing team.

We know this is not a game for the faint of heart, but we navigate the corridors of the Valley with pride for not having endured any hiring freezes or layoffs.

We are hiring (have you checked our careers page?), but since adopting a fully remote setup, we are doing it in an even more conscientious way, bearing in mind the new challenges arising.

Old pals successfully adjusted their work and social interactions and continued their long-lasting relationships. But while they have been there, done that, bought the t-shirt with this transition, it's a whole different story with new hires. New beginnings are amazing, but they can be a maze, too—so many new things to explore, new teams' dynamics to discover, and new faces/names to learn. Add 'remotely' to the list of ingredients, and it can be a recipe for disaster. That's why our managers ensure that, on top of having a curated onboarding plan, each new joiner also has an experienced buddy supporting them throughout the onboarding journey, providing inside knowledge or just being there to chat.

Adjusting our benefits to the work from anywhere setup

We have switched from traditional, office-based benefits to remote-friendly benefits, and we now offer benefits such as 25 days of holidays, flexible working hours, and two allowances to support the building and maintenance of a home office. We’re also committed to offering the same type of benefits to everyone, no matter their location.

We are also tackling a common pitfall of being a fully remote and distributed team: some people might not feel socially connected to their teams.

Growing our headcount during a worldwide pandemic didn’t help, but we are committed to helping our teams. At the end of the day, we’re a team of professionals who are looking to build relationships of trust and collaboration with our teammates. We cherish in-person moments too, where we can see our teammates as whole people by literally seeing them and not just looking at their faces while on a Zoom call. What’s to note is that when it comes to bonding time, we are all about quality over quantity, so we said goodbye to the watercooler chat, but we still meet in person twice per year (at least). We make use of that time to work together but also to have fun. For instance, our summer party this year was 🔥 for many reasons, none of them being the warm temperature.

We don’t have to be rocket scientists to predict that remote work will not go away anytime soon (or ever?), and, just as with office-based work, it has challenges that must be addressed. Solving them is a marathon, not a sprint. Meanwhile, we will move forward steadily and fearlessly, but also with an open heart and mind, learning from other companies and taking in our team’s feedback. We will continue to evolve, innovate, and push each other to improve by thinking out of the box and believing that without challenge, there is no growth.

Please share your thoughts with us and help us get better.

Thanks for reading.

2021 has passed in the blink of an eye, yet MOSTLY AI can be proud as this was a revolutionary year of many extraordinary achievements. While we are already excited for what 2022 holds for us, we are taking a step back to look at the highlights and major milestones we have accomplished in 2021.

Synthetic data revolution

Our developers had a busy start to the year with the new upgrade of our category-leading synthetic data generator, MOSTLY AI 1.5. Alongside many shiny new features, the big buzz was about our synthetic data generator now supporting the synthesis of geolocation data with latitude and longitude encoding types. Say goodbye to harmful digital footprints and hello to privacy-safe synthetic geodata!

This was not enough for our very ambitious team; so in the second half of the year, they pushed the boundaries even further by truly revolutionizing software testing. With this new version of our platform, MOSTLY AI 2.0 became the first synthetic data platform that can automatically synthesize complex data structures, making it ideal for software testing. By expanding the capabilities to multi-table data structures, MOSTLY AI now enables anyone – not just data scientists – to create synthetic data from databases, automatically. This improves security and compliance and accelerates time to data. Our team truly deserves a toast for this!

The Data Democratization Podcast

We’ll be soon celebrating the first birthday of “The Data Democratization Podcast”, which we started back in January 2021. With over 2000 downloads in 2021, the podcast was an absolute hit! Our listeners had the opportunity to get so many insights from knowledgeable AI and privacy experts working in top-notch companies who shared their experiences, advice, and real-life case studies. We are entering the new year with even more enthusiasm and are preparing some special surprise guests for you. Stay tuned!

Synthetic data training for superusers

In 2021 we also launched our professional services and training program intended to help create the next generation of synthetic data superusers within enterprises. Several clients have already leveraged this first-of-its-kind program to kickstart their synthetic data journeys, with very positive results. As synthetic data pioneers, we have the most experienced team in the world. Our top engineers, architects, consultants, and data scientists have seen it all. They know what makes or breaks a company's synthetic data adoption, no matter the use case. From scaling ethical and explainable AI to providing on-demand, privacy-safe test data, the know-how is here.

Synthetic data talks

Despite COVID-19 we have managed to attend multiple conferences. While most of them happened virtually, we participated in Slush 2021 in person! Our Co-Founder & Chief Strategy Officer Michael Platzer rocked the stage presenting at this year's event in Helsinki, Finland. We are proud to have been invited to present our synthetic data solution to the world and - while staying safe - connect and exchange ideas with some of the most brilliant minds.

The only synthetic data provider to achieve SOC2 and ISO certifications

With data privacy and information security at the heart of everything we do, our efforts to ensure the privacy and integrity of our customer’s sensitive data by following strict security policies and procedures have been officially recognized this year. In March, we received the SOC 2 Type 2 certification, which is an audit report capturing how a company safeguards customer data and how well internal controls are operating and later in November, we got awarded the ISO 27001 certification which is a globally recognized information security standard.

Thanks to both SOC2 and ISO certifications, our customers and partners can now speed up vetting processes and immediately get a clear picture of the advanced level of information security standards we hold.

Growing the Order of Mostlies

All this wouldn’t be possible without MOSTLY AI’s most important asset – our team (or Mostlies as we like to call them). In 2021, we welcomed quite a few new Mostlies to the team - amongst them new executives to strengthen our product, marketing and sales activities.

The first one to join the team this year was Andreas Ponikiewicz as our Vice President of Global Sales, who took the lead for MOSTLY AI's international sales team across Europe, North America and Asia and has brought our communication with the clients to the next level. Shortly afterward, we welcomed our new CTO, Kerem Erdem, onboard. As a true captain, he is leading us on the way to accelerate our tech performance and enable organizations to thrive in an ethical, responsible way with smart and safe synthetic data. To help get the word out, in early May, Sabine Klisch joined the team as VP Global Marketing and is now leading our creative marketing team on our journey to position MOSTLY AI as the global leader for smart synthetic data. And to spice up the story even more, we have added a special Italian ingredient – Mario Scriminaci, our new CPO who is making sure our synthetic data platform is the number one solution and provides our customers with better-than-real data.

The Best Employer Award

As already mentioned, Mostlies are the most important part of MOSTLY AI and it seems we are doing something right since we made it to the top 3 of Great Place to Work and received Austria's Best Employers 2021 award.

The MOSTLY AI team is truly diverse, with more than 15 different nationalities represented. Almost 40 members strong, we are organized in several teams, including data science, engineering, product, marketing, sales, and operations. The majority of us are based at our headquarter in Vienna, but an increasing number are working remotely spread across the entire world. What has started as a necessity because of COVID-19 has now become an integral part of our company culture.

Looking back, we can say this year has exceeded our expectations by far. One team of devoted professionals all united with the same vision – to empower people with data and build a smarter and fairer future together.

What’s next? 2022 is said to be the year of synthetic data. According to Gartner, by 2024, 60% of the data used for the development of AI and analytics projects will be synthetically generated. 2022 will also be the year of MOSTLY AI and we will have exciting news to share with you very soon.

Stay up to date with all things synthetic data!

If you want to stay in the loop with the latest developments in the world of synthetic data, join our newsletter group for monthly updates.

Subscribe to the MOSTLY AI Synthetic Data Newsletter!

MOSTLY AI has gathered a team of exquisite experts and data science enthusiasts. We would like you to meet our team 'eye to eye' and get to know all our devoted Mostlies who ensure that we deliver the best product and stand out in the market. Today, we bring into the spotlight Andreas Ponikiewicz, Vice President of Global Sales at MOSTLY AI, who will share more about himself, his role, and his experience with clients who foster innovation with the help of synthetic data.

Can you tell us a bit more about your background and how you ended up working for MOSTLY AI?

My background is actually in economics and finance. When you finish university, you try to work for big companies and get their logos on your CV, but I had the opportunity right before I finished my studies to join an Austrian startup in the finance industry, and I very quickly realized how great such an environment is. Small companies are more innovative. I liked people's mindsets and how a small company with the potential for growth offers products that the big guys don't have. Every client you win is very big, so you celebrate as a family.

Even though I started as a financial data analyst, I soon became curious about who our clients are and why they buy our products. I wanted to talk with and understand them, so I switched to a product management role in which I developed products based on our clients' feedback. I got to travel across Europe and collect feedback from our new and existing customers and then build, together with my team, a product that sold 30 times more than previously — and this is how I realized that communicating with clients is very important. I also realized that I like to talk to customers instead of sitting at the office. So, I switched to a more sales-oriented role, selling the startup's complete product portfolio and eventually becoming Head of Sales. After some successful client wins, we got acquired by a very big company, which led to my corporate life.

However, when you come from a dynamic startup, the regular corporate life is not as interesting. I came to know that MOSTLY AI was looking for a Vice President of Sales to lead the team and build up the business. I applied and got a call back soon after, and that is how I ended up being at MOSTLY AI.

How do you see your role at MOSTLY AI, and what brings excitement to you at work?

It's the big impact, I think. The business we are in is very innovative and exciting. We see that the big industries are only beginning to understand the potential of synthetic data. When I look at the competition, I see us very well-positioned because when we do POCs and talk to clients, we get very good feedback that we are top-notch here, and this is very motivating. I like the idea that data protection and innovation are not enemies and that there is a smart way to combine both. Also, I believe the mindset shift is coming to the industry, and the companies that can show that they care about the privacy of their clients will have a big advantage compared to those who don't. Apple is one of the first who started this trend, and I believe many will follow.

How would you describe synthetic data to an alien?

'I want to know everything about your behavior, but I don't want to know any personal details about you.'

What do you think about synthetic data? Is it really so beneficial for companies as everyone keeps saying?

Synthetic data is not a new concept. Today's AI and ML (machine learning) technology and processing power are so big that now you have the capability to recreate highly realistic data sets, which was not possible a few years back. That is what brings the traditional anonymization techniques at risk, because first of all, they are not 100% safe, and secondly, they destroy a lot of the data. Now companies are actively researching this area and see a very good opportunity in synthetic data.

What do you think about data privacy nowadays? Is it impossible to preserve it with all the new technologies?

Some people say that privacy nowadays is an illusion. The moment you have a mobile phone, you give up privacy. If you want to have a bank transaction, you do it online, so people know. If you want to buy something on Amazon, you give up privacy. If you search for something on Google, you give up privacy. If you have a LinkedIn account, you give up privacy.

In the end, you can choose between convenience or living the life of Robinson Crusoe. Somewhere in between, you need to decide if you want to have convenience or privacy. When it comes to business, that is a different story. As a company, you can say we truly care about your privacy, what you buy, how much you earn, where you live, and what you like, and we do not share it with anyone else because your data is safe with us. That is a business-to-business value proposition.

Can you tell us about some interesting data leaks you know and how they happened? How could the company have avoided it?

There was one major data leak when someone found out how Amazon had a super-biased algorithm that favored white male candidates, so every woman who applied was just discarded because of the algorithm. This was then leaked somehow, which seriously damaged their reputation. Netflix also had a data leak when researchers identified sensitive information based on subscribers' movie ratings. Many financial institutions also lost data, which is a disaster from a reputational perspective. The more you search, the more leaks you find. It is not something that happens from time to time; it is actually a constant danger. That is why you should choose privacy by design; it is the only way to protect your data.

Where are you originally from, and how much is data privacy regulated there?

I was born in Poland, but I grew up in Austria and went to kindergarten here. I can say that privacy is taken very seriously in Austria. Germany and Austria, especially, are territories that invest a lot in data protection. This sometimes makes it difficult for them to innovate and leverage data. It is one thing to protect your data, but if you don't have a technology, like synthetic data, to unlock the value of data assets, you might not be able to innovate. The problem is that if you want to innovate, you need data, but on the other side, you need to be careful what data you share and what information you disclose. This is why I see the potential of our business in bringing both together — unlocking the data that companies have but in a 100% private way.

Do you have any good documentaries or movies related to AI and big data that you could recommend to our readers?

We had a virtual movie night here at MOSTLY AI just a few days ago, and we watched this documentary called 'Coded Bias,' and that is a pretty interesting film for someone who wants to dive into this AI world a bit and learn about why we have data bias, why fairness in AI matters, and what are its risks and challenges. On the other hand, data bias is an issue that, aside from privacy concerns, can be solved with synthetic data. I would recommend watching this because it can definitely trigger some thinking.

Aside from an obvious interest in AI and machine learning, what do you like to do in your free time?

I listen to tons of music. I like to read a lot about history. I am interested in all of the big developments over the last centuries because I can derive some understanding of what is happening today, and I believe it is all somehow connected. I go to the gym if it is open, but now, during the COVID pandemic, it is a bit difficult to do so, unfortunately. Then, of course, I like to meet friends afterwards. I am a guy who likes to talk to people.

Since we are in the middle of the world's pandemic, how much has your life changed?

Dramatically. Before the pandemic, I was traveling almost every week across Europe because that is the main part of my business role — meeting clients — one week in Switzerland, the next week in London, and so on. Also, I used to meet friends often, and not being able to do that or travel for business has been quite a lot to deal with. So, from time to time, it would be nice to do that again.

Regarding the spread of pandemics, how do you think synthetic data could help solve this issue?

People say it is just a matter of time before the next pandemic pops up because there are so many interactions between humans and nature with the potential that a new virus could spread from animals to humans. I think that, in the case of the city of Vienna, for example, synthetic data could be applied to analyze some patient data, the spread of the virus, and how people move. If this could be anonymized and they could still track the movement of people on a global level without leaking any patients' private information and who went where and when. I believe this could be very helpful when trying to understand the dynamics of the pandemic.

We hope to see that application in the future. What other possible applications of synthetic data are currently in practice?

There are so many applications and use cases for synthetic data already. One can analyze people's behavior anywhere they interact. So, in companies such as those in finance, insurance, telecommunications, retail, health, pharmaceuticals, and all others that depend on interactions between people, if you want to analyze their behavior, synthetic data can be applied. The scope is so broad that I believe it is a good thing we are focusing on certain areas because otherwise, we would be lost in all those potential applications that exist. I believe that in the next few years, many companies will focus on a different niche.

How do you think the regulatory landscape will change in the next year or so?

I think that the regulators will also catch up when it comes to understanding how AI and ML work. Currently, many regulations are still based on an old economy and old-world design, and they will get more sophisticated about AI, how AI should be controlled, what it can be used for, and what requirements it should fulfill to ensure that it is not used against the good of humankind. This is still like a black box to many regulators and companies. It is the wild west. No one is really focused on what impact AI has because we see it being used everywhere. I think the regulatory bodies need to acquire a better understanding of AI and derive regulations that show they really understand the impact of AI because I don't believe they do.

What is the most pressing issue for enterprises on the IT and data front? In your experience, what makes large organizations succeed?

Being innovative. Understanding that every product offering and every service you have has a certain lifecycle. That means a product that is new today might be irrelevant in a few years. The companies that realize that you have to change your offer, that you have to adapt your product, that your customers change, and new clients have new requirements. Companies that understand the cycle of innovation will remain in the market, while the ones that rely too much on what they have will become dinosaurs and disappear sooner or later. The question is if a company has enough data to understand its clients and is able to derive decisions from that and innovate and bring new products and services to market. Those companies are the ones that will remain the leaders in the market.

What would your best advice be for those looking to implement synthetic data in their companies?

Think big! When you face a challenge — for example, not being able to use the data you have because it violates privacy laws — you need to think of what your potentials are and how you can apply this to different areas, such as using it for ML models, to understand the behavior of your clients, to understand what do they buy and why, how do they spend their money, etc. You can derive insights from that and develop products you couldn't even think of before because now you have the insights into data you didn't have before. So, while before you worked with assumptions, now that changes to knowledge, and this is enabling marketing and product management to be customer-centric.

Any final remarks?

I am very much looking forward to the next one or two years because I believe we are at the beginning of a wave, and we see that big institutions but also small companies are now becoming very interested in the potential of synthetic data. I believe that in the foreseeable future, many companies will have a big advantage if they can leverage this potential compared to the ones who still live in the traditional world.

"Those who innovate will remain in the market; the others will become dinosaurs and soon become extinct."

Andreas Ponikiewicz
Vice President of Global Sales at MOSTLY AI

If you would like to know more about synthetic data and innovation, feel free to reach out to Andreas!

We had such a busy start to 2021! Our developers worked hard to deliver much anticipated new features to simplify our customers' lives with faster, safer, and easier processes. A serious legal assessment was underway, while the MOSTLY AI team also made the SOC 2 certification happen. Microsoft, Telefónica, the City of Vienna, and many others have been using our synthetic data generation platform to make the most of their data assets, with Erste Group signing a 3-year partnership last month. An important piece of research was also born, proving that synthetic data for Explainable AI will be an important use case.

The feedback we have received so far makes it abundantly clear that AI-generated synthetic data is the way to go for large organizations looking to step up their data game. And the new version of our category-leading synthetic data generator, MOSTLY AI 1.5 is the tool that provides the level of maturity, usability, and data quality that is crucial to scale synthetic data in an organization.

Legal support for synthetic data is part of the product upgrade

Privacy protection and data security have a special place in our hearts. We take this very seriously, and completing the SOC 2 certification is a very meaningful step for the team, reinforcing all that we stand for. SOC 2 assures our customers that we follow consistent security practices and that we are able to keep their valuable data always safe and protected through the implementation of standardized controls.

Another important way in which we support our customers' legal teams is by providing a Data Protection Impact Assessment (DPIA) blueprint for MOSTLY AI's synthetic data platform. This document, created in collaboration with the reputable law firm, Taylor Wessing will allow legal teams to demonstrate compliance to regulators easily.

Work faster and synthesize data easier

You can now use the Data Catalog to enable carefree automation of synthetic data pipelines and store links to data sources together with their configuration settings. Synthesis is now a one-click job.

Using the REST API, you can create fully automated synthetic data pipelines. You can easily integrate MOSTLY AI's synthetic data platform with upstream ETL applications and downstream post-processing tools.

GPU accelerated synthetic data is like synthetic data with wings. Using the brand new GPU training option, you can now synthesize your sequential datasets in considerably less time, without any impact on synthetic data quality or privacy.

MOSTLY AI 1.5 now natively supports Parquet files, enabling faster time-to-data, as converting to CSV is no longer necessary. From now on, you can save your encoding configurations as a JSON file and use your own tooling to generate configuration settings for datasets with a large number of columns.

Now there is also a turbo button for synthetic data generation: you can now choose to optimize model training for speed. It's really fast and the resulting synthetic data is only a little less accurate. Great for use cases where speed is of utmost importance, but accuracy isn’t paramount, like creating realistic data for testing.

Stay safe with added synthetic data controls

MOSTLY AI’s new User Management system allows you to securely control user access to data, run job details, and synthetic data generation features. Onboarding and offboarding employees is now a breeze. Users can log in using their Active Directory credentials.

You can now use stochastic rare category protection thresholds for categorical variables, which randomizes the decision of whether to include or exclude categories whose frequency in the data is very close to the inclusion threshold. This makes it now impossible to infer even the parameters of the rare category protection, adding an additional layer of protection for outliers and extreme values.

The consistency correction feature helps generate consistent historical sequences for your synthetic subjects when there is a large variety of values. Users can enable consistency correction per categorical column in their event table, and Admins can configure in the Global run settings whether Users can work with this feature.

A new encoding type: synthetic geolocation data

Due to popular demand, we are now supporting the synthesis of geolocation data with latitude and longitude encoding types. It's time to get those footprint datasets ready to work for you in a privacy-preserving way!

We would love to hear your feedback! If you are using MOSTLY AI 1.5, please let us know what you think, as we continuously strive to build an even better product for you. If you are not yet our customer but are curious to find out how our synthetic data platform can increase the ROI of your data projects, contact us for a personalized demo!

For MOSTLY AI, the security of our customers' data is a top priority. Our efforts to ensure the privacy and integrity of their sensitive data by following strict security policies and procedures has now been officially recognized. We are pleased to announce that we are SOC 2 Type 2 certified!

What are SOCs?

SOCs, or Service Organization Controls, are a set of compliance standards that were developed by the American Institute of CPAs (AICPA), a member network of more than 430 000 CPAs around the world. The ability of a company to handle confidential information is examined through an independent auditing process of the organization’s policies, procedures, and internal controls. Testing and reporting of these controls is important because they impact the security, privacy, and confidentiality of sensitive data.

Why does a SOC 2 certification matter?

Working with SOC 2 certified vendors, such as MOSTLY AI, assures the customers that the vendor follows consistent security practices and is able to keep customers’ valuable data always safe and protected through the implementation of standardized controls as defined in the 'AICPA Trust Service Principles framework'. The idea of synthetic data was born out of the need for a bullet-proof data privacy technology. We live and breathe data privacy and being fully aware of all that it entails makes us extra motivated in following state-of-the-art security measures.

“I believe that our SOC 2 type 2 certification proves that our internal processes are completely aligned with the protection of users’ data. It shows how mature we are as an organization and that our customers and their data are in safe hands.” Tobias Hann, CEO, MOSTLY AI