IMDB – Project Update

I’ve read the entire actors.list.txt file. Just a quick second to bitch about data.

When you have a data file – all it should have in it is the data. At the top of this file (and I assume all of their files) are licensing and instructions to read the file. This should be in a separate actors.readme file.

But then…. Then. I see the end of the file. It has a whole lot of instructions on how to send them updates to the files, add new information to the files in exhaustive detail.

I appreciate the instructions; however, the file is GBs in size and most normal people cannot even open it. Again, this information should be in a actors.readme file. Now, the program has to watch out for non-data bearing information at the top of the file as well as at the bottom of the file.

Which is ok. I like to bitch about it. But since I’ve loaded the entire file as raw data in to a database I can quickly locate these kinds of problems and just delete them from the raw data. It is frustrating; however, to have this added task – and it is probably in all the files as well.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s