Every day, enormous amounts of information are added to our world. Moost of this is digital information, and the digital universe continues to expand at an exponential rate. The IDC estimates that by 2020 there will be over 40 ZB of data in the digital universe, or about 1.7MB of new information for every person on the planet every second of every day. Consider Facebook alone: it has about 1.97 million worldwide users who are active at least monthly. The amount of data added to Facebook in just an hour is almost too much to fathom.
All of this information is stored in both structured data and unstructured data forms. Structured data is fairly organized. It?s typically easy to search by algorithms; while unstructured data is much harder to go through. Text analytics or text mining is the process of going through all this digital data for useful information.
For a government, text analytics can be helpful with border security by helping to identify dangers at screening or even predict future issues. For a business, text mining can look at huge ranges of documentation, for example, and provide accurate insights based on a wide number of data sets. It can also help engage customers more effectively, as natural language processing search algorithms are able to tell the business what customers are thinking by the way they search and talk.
Using big data analysis on structured data is fairly straightforward and search algorithms already exist. The challenge for the future, however, is developing an effective extractor for unstructured data. What?s needed are things like entity resolution, which finds mentions of the same entity across a number of data sets; or sentiment analysis software, which is able to tell how people are reacting to something based on their social media posts.
Less than 1% of our current digital date is currently being effectively analyzed, according to the International Data Corporation. This means there is a lot of room for growth in both structured and unstructured data forms.
Developing effective customizeable extraction while respecting individual privacy is the challenge. For businesses, this will be necessary to tap into the $3 billion a year text analytic market: a market which will be worth an estimated$6 billion by 2020. For governments, this analysis could be critical to ensuring the safety of its citizens, infrastructure, and our very future.