Over the last few months I’ve been working with more and more people who have the buzz-phrase, “Big Data” on their minds. For some reason, many people think that “Big Data” (in this context, I’m referring to “Big Data Analytics”) has magical powers and doesn’t require the same careful planning and modeling process that traditional analytics does.
Let’s go ahead and put this thought to rest now. “Big Data” is still “Data” and the data model will make or break your project. The Data Architect (note that I didn’t say “Data Scientist”, that’s a bit different and we’ll get into that in a later post) is still a key position on any Big Data project, and as a matter of fact due to the overwhelming amount of data that is crunched, analyzed and stored, the Data Architect is likely your most important asset.
It is very true that traditional, narrowly-focused data modeling techniques will not work in the Big Data world. Data Architects working with Big Data must understand that Big Data requires thought on an entirely new dimension (forgive the pun) in the modeling of data structures. Not only must you consider the traditional modeling of relationships, structures and indexes, you must also understand how parallel processing will impact your queries, how data will be loaded in partitioned systems, and how adding new nodes into a Big Data cluster will impact existing data models. These are new skills that Data Architects who plan to work with Big Data must learn.