💡 Download the complete guide to AI-generated synthetic data!
Go to the ebook
July 25, 2023
3m 51s

Tips and Tricks for Synthetic Data Generation: Missing dates

Transcript

How to prep your data with missing dates for synthetic data generation? Tune in to find out how a few simple SQL tricks can help! Once your dataset is ready, simply go to mostly.ai, register your free forvever account and upload your data.

Transcript

[00:00:00] In today's tips and tricks video, I'm going to show you

[00:00:04] how to do a little bit of data engineering,

[00:00:06] a little bit of data prepping to improve

[00:00:09] the quality and the ease of use of a data set with dates in it.

[00:00:15] Specifically,

[00:00:16] it's a data set that you may know from other videos, with baseball players.

[00:00:23] If you think of the dataset baseball players,

[00:00:25] it has some birthdates

[00:00:28] and it has some death dates for people that are already sadly deceased,

[00:00:34] and that means that some of the values are going to be missing.

[00:00:39] If somebody is still alive, they don't have a death date,

[00:00:43] and so that field in that row is going to remain empty.

[00:00:48] In a CSV file,

[00:00:50] that might be just an empty string,

[00:00:54] but for a date, we have to use a different technique.

[00:00:57] Let me show you.

[00:01:00] First of all,

[00:01:02] if you try to just take a table

[00:01:05] that has the date as a string, coded as a string,

[00:01:12] you may be tempted to do something like this, which is ALTER TABLE,

[00:01:16] then the name of the table ALTER COLUMN birthdate,

[00:01:18] TYPE date, so just forcing it into date.

[00:01:22] Then, it says, "Hey, you cannot do that,

[00:01:25] you may need to specify what's called a cast.

[00:01:31] It's a specific operation to change the type.

[00:01:36] If you try that, without further ado,

[00:01:40] you get an error message here saying that some of the dates

[00:01:46] were not appropriate dates because an empty string is not a date.

[00:01:52] How do you fix that?

[00:01:54] For that, you can use the UPDATE command,

[00:01:57] an UPDATE TABLE is actually a syntax error,

[00:01:59] so I'm going to show you straight away the correct syntax.

[00:02:02] You just say UPDATE, then the name of the table.

[00:02:05] You say 'set' for what you actually want to change.

[00:02:08] You say, birthdate=NULL where

[00:02:27] Null is a special value which means empty. I'm doing that for both birthdate and for deathdate as well.

[00:02:36] Here you see I'm fixing both the birthdate and the deathdate by putting in null values where previously there were empty strings.

[00:02:45] If you do that, then you can go and say, ALTER TABLE, ALTER COLUMN birthday type,

[00:02:51] TYPE date USING birthdate, and then this notation here means cast. It means cast the birthdate to a date, and that works, and then the same thing you can do on the deathdate, that also works, and then you can check out the result by saying something like this here.

[00:03:14] You see that the data is still there, it's displaying nicely.

[00:03:21] Then if you ask for the schema of that table, then you see that the birthdate and the deathdate have been set to timestamp without time zone.

[00:03:33] Why without time zone? Well, because it doesn't really matter if you don't have a time. It doesn't really matter what time zone. Automatically that was the type chosen.

[00:03:47] That was it. Thank you for watching.

Ready to try synthetic data generation?

The best way to learn about synthetic data is to experiment with synthetic data generation. Try it for free or get in touch with our sales team for a demo.
magnifiercross