OK, so you and your company have decided that this Big Data is more than a buzzword, and that you’re going to jump feet first into this. You’ve worked with various business leaders, planned out a pilot project, and set aside a budget. But then things come to a grinding halt! Where do you find the skills to actually deploy this?
Despite the activity around Big Data, there is still a significant shortage of skilled professionals who can truly be called Data Scientists who can evaluate business needs and impact, write the algorithms and program platforms such as Hadoop.
The Hadoop framework is broad, and is a new menagerie of jargon and projects: HDFS, Hbase, Hive, Pig, Zookeeper, Map/Reduce, and R just to name a few.
During my trip to the Bay Area this week, I was very encouraged to hear and speak to several companies who have taken some very positive steps towards helping the IT community bridge this gap.
The most significant one to me, are those companies who provide the libraries or interfaces that allow traditional database administrators (DBAs) who have spent years learning, honing and perfecting their skills on well known platforms such as Oracle, IBM DB/2 and others. This traditional database platforms, known as relational database management systems (RDBMS) all use a language called SQL (structured query language). Some Big Data companies are beginning to look at ways of taking the SQL language and allow these queries to be performed on Hadoop.
Now, RDBMS and SQL somewhat goes directly against the principles that forced Hadoop to be created in the first place, which is requirement to have a predefined structure (known as a schema) of the data being stored. The basic idea behind Big Data systems is to breakdown these traditional schemas so that data can be queried and analyzed by any number of factors.
What does all this mean?
In the most extreme case, it means that traditional Oracle or DB/2 based applications could essentially run on top Hadoop. In more realistic applications, it means that some traditional applications could be migrated to run on Hadoop, as new data sources are integrated with traditional structured databases. New queries could then be created to take advantage of the traditional and the new data sources together to provide new insight and value to the business.
Most importantly, it means that DBAs who have spent so much of their time, energy, and money in developing their skills in the traditional RDBMS world will have a new future in the Big Data world. It means that enterprises have the opportunity to help their IT professionals retrain their skills, maintain job security and not lose the intimate knowledge of the enterprise many of these DBAs have gained over the years.
Finally, it means that enterprises have an opportunity to step quickly into Big Data, which in my opinion is an absolute necessity. At Neuralytix, we believe that when it comes to Big Data:
If you’re not doing it, your competitors are!
There is still a long way to go. For budding data scientists, now is the time to go get yourself educated! You’re in great demand. Until the skills shortage is met, opportunities are ripe for developers to find new and interesting ways of bridging the traditional with the innovative.