Use the Text encoding type to synthesize unstructured natural language texts up to 1000 characters long.

You can use this encoding type to generate realistic, representative, and anonymous financial transaction texts, short user feedback, medical assessments, PII fields, etc. As the resulting synthetic texts are representative of the terms, tokens, and their co-occurrence in the original data, they can be confidently used in analytics and machine learning use cases, such as sentiment analysis and named-entity recognition. Even though they might look noisy and not very human-readable, they will work perfectly for these use cases.

Our privacy and accuracy tests cannot detect potential leakages of protected rare categories or measure how representative the resulting synthetic texts are.

Our text synthetization model is language-agnostic and doesn’t contain the biases of some pre-trained models—any content is solely learned from the original training data. This means that it can process any language, vernacular, and slang present in the original data.

The amount of data required to produce usable results depends on the diversity of the original texts' vocabulary, categories, names, etc. As a rule of thumb, the more structure there is, the fewer samples are needed.

The synthetic texts are generated in a context-aware manner—the messages from a teenager are different from those of an 85-year old grandmother, for instance. By considering the other attributes of a synthetic subject’s profile, MOSTLY AI is capable of synthesizing appropriate natural language texts for each of them.

Below, you can find two examples. The first example demonstrates MOSTLY AI’s ability to synthesize entirely new names from a multilingual dataset. And the second example shows the result of synthesizing Tripadvisor reviews. Here you can see that the resulting texts accurately retain the context of the establishment they discuss (Restaurant or Hotel) and the synthesized rating.

Multilingual names dataset

Original Synthetic
    Nationality     Name
 1: Czech           Svoboda
 2: Greek           Chrysanthopoulos
 3: Spanish         Ventura
 4: Russian         Gagarin
 5: Japanese        Yokoyama
 6: English         Parsons
 7: Spanish         Ruiz
 8: Russian         Chekhov
 9: English         Blake
10: English         Wigley
    Nationality     Name
 1: English         Olsewood
 2: German          Kort
 3: Japanese        Misaghi
 4: English         Roger
 5: Russian         Lusov
 6: Russian         Zhuszenko
 7: Japanese        Noraghi
 8: English         Dalman
 9: Russian         Michov
10: Polish          Poskan
11: Arabic          Shaif

Tripadvisor reviews

Original
   Establishment    Rating  Review
1: Restaurant       6       Not bad, great interior but let down by
                            unimaginative food. Perfectly good for a
                            quick lunch or drink though, good ales!
2: Hotel            2       Awful!!! stunk of smoke! guttering outside
                            window ledge filled with cigarette ends and
                            bottles. NOISY air con unit in room, husband
                            had no sleep.
3: Restaurant       8       Helpful staff, pleasant enough with quick
                            service. Sat at bar by revolving food server.
                            Everything seemed nice and fresh. Good value.
4: Hotel            8       We stayed in a standard room at the hotel.
                            The room was adequate, though a bit short on
                            cupboard/ drawer space.
5: Hotel            4       Expected much more from here and they just
                            didn't deliver, for the price of the room it
                            was no different than any of the other cheaper
                            aparthotels.
Synthetic
   Establishment    Rating  Review
1: Restaurant       8       My only complaint are the portion sizes. Lovely
                            restaurant with good food, though.
2: Restaurant       8       I'm a year-round regular. Service is really
                            friendly. The starters are OK and the seafood
                            buffet is amazing and tasty. Overall a nice menu
                            throughout and our children love it.
3: Hotel            8       This is a fantastic hotel. Great food but few
                            options, a brilliant room and spent an excellent
                            time. Very clean environment and a high level of
                            service.
4: Hotel            2       I booked an offer for a spa day. The food was
                            below-average, the room was dated, smelled of
                            fried fish, and the staff has an attitude.
5: Hotel            2       We've been here before. But for £70, the rooms
                            are still poor and glamourless. We spent a few
                            days with 5 people and 4 were not impressed.