Application of Big Data Analytics: Opportunities and Challenges
Author: Muhammad Muhammad Suleiman and Muhammad Bello Aliyu
Journal Name:
Download PDF
Abstract
Companies are beginning to see the value of having large amounts of data at their disposal to make the best decisions and achieve their goals. As new technologies, the Internet, and social networks emerge, the volume of digital data continues to grow. In this era, everything around us is continually generating Big data. Big data is coming in at an alarming velocity, volume, and diversity, and it's coming from a variety of places. You'll need the best processing power, analytical capabilities, and talents to get the most out of big data. Big data has sparked interest in several sectors, including data mining and machine learning. This text aims to discuss the new concept of big data and data analytics, including its concept, technologies, and various types of this innovation that are designed to allow for efficient data mining and information sharing fusion from social media, as well as the new applications and paradigms that fall under the "umbrella" of social networks, social networks, and big data concepts. This colossal amount of data arrives from all across the world, daily. Structured, unstructured, and distributed big data are all possibilities. Certain tools and techniques are required to manage large amounts of data. This paper provides an overview of the notion of big data, as well as issues, challenges, and tools and strategies in this domain.
Keywords
Big Data, Big Data Analytics, Structured & Unstructured Data, MapReduce, Volume
Conclusion
The mechanical world is quickly developing today. Associations of all sizes are feeling the squeeze to be information-driven and to achieve more with less. Regardless of whether large information advances are still in their early stages, the effect of the 3Vs of enormous information, which are currently the 5Vs, can't be ignored. The second has come for organizations to begin creating and carrying out their MapReduce or Hadoop-based information lake. Organizations that have the important framework, individuals, and vision are strategically set up to propel their large information procedure and change their organizations. A part of exploration called Data Science has arisen and is developing to manage huge information, work with it, and get benefits from it. Information Science is a discipline of science worried about deducing and investigating information from large assortments of information, the greater part of which are unstructured or semi-organized. Unrest's changing the globe and has applications in finance, retail, medical care, assembling, sports, and correspondence. Numerous information researchers are required and will be expected via web search tools and advanced showcasing firms like Google, organizing organizations like Facebook, and monetary and web-based business enterprises like Amazon.
References
I. INTRODUCTION
The use of Internet-connected devices aids in the generation of digital data. Cellophane, tablets, and laptops, as a result, communicate information about their owners. The use of everyday objects by consumers is tracked using connected smart objects [1, 2]. Big Data is a data-driven approach enabled by recent technical advances that allow for high-speed data collection, storage, and analysis. Data sources that go outside the standard company database include emails, mobile device outputs, and sensor-generated data. Data is no longer restricted to structured database entries, but rather includes unstructured data that lacks a standard framework [2, 5].
The three Vs of big data, according to Gartner, are volume, velocity, and variety. Gartner expanded its definition in 2012 to include veracity, which expresses faith and scepticism in data and data analysis outcomes. In a 2012 study, IDC highlighted the 4th V as the value, emphasizing that Big Data applications must deliver an additional advantage to businesses [6, 7]. Big Data Analytics is the process of turning unstructured data from call logs, mobile banking transactions, online user-generated content such as blog posts and tweets, online searches, and images into useful business information by employing computational techniques to uncover trends and patterns among data sets [8, 9].
When software suppliers say "Big Data Analytics," they're referring to the technology that enables a company to handle large amounts of data. This not only distinguishes Big Data storage and processing from the common or structured data that most people are familiar with, but it also implies that businesses now demand strong, integrated solutions to make this data useable and applicable for business analytics. When dealing with enormous data sets, businesses confront difficulties integrating, processing, and managing them effectively and efficiently [3, 7, 8, 10].
II. TRADITIONAL DATA SYSTEM
Around 25-35 years ago, traditional data technologies such as relational databases and spreadsheets were the primary source for storing and analysing data for businesses and organisations. The systems were created primarily to handle structured data, and the data was highly ordered. Though there were other digital storage systems, these were the most often utilised. In a single computer, traditional data solves huge and difficult issues [11]. It relied on a centralised architecture, which is inefficient and costly for huge data volumes, whereas big data is based on distributed database architecture [12]. Traditional database system operates with structured data, but big data works with both semi-structured and unstructured data. Traditional databases keep tiny amounts of data ranging from a few gigabytes to a few terabytes, whereas big data can store and analyse data in the hundreds of terabytes or petabytes range and beyond. The expense of storing a significant volume of data is reduced, which aids Business Intelligence (BI) [2, 9].
III. WHAT IS BIG DATA
The term "Big Data" refers to the creation and implementation of technologies that provide the right user with the right information at the right time from a large amount of data that has been growing exponentially in our society for some time [13, 14]. The problem isn't simply dealing with ever-increasing data volumes; it's also dealing with data in ever-diverse forms, as well as data that are becoming increasingly intricate and interconnected [1, 2, 10]. "Big data" is defined by Apache Hadoop as "a dataset that cannot be gathered, handled, or processed within an acceptable scope by computational methods." [4, 12, 15].
Big Data is described as a set of data and technology that accesses, integrates, and reports all available data by filtering, correlating, and reporting insights not achievable with earlier data technologies, according to the APICS Dictionary, 14th Edition. Data analysis on a larger scale than humans is referred to as big data. Databases used to be limited to serving the needs of human users who entered and retrieved information. Because of the expansion of e-commerce and Internet search engines, database technology is evolving to manage humans and computers. With the amount of data growing at a 50% annual rate, information technology is the only method to organize, process, and locate it [13, 16, 17].
IV. CHARACTERISTICS OF BIG DATA
Enormous Data alludes to the 3V's as the huge, profoundly enhanced datasets (Volume), as well as coordinated, semi-organized, and unstructured information (Variety), coming quicker (Velocity) than at any other time [1, 4, 10, 12].
1. Volume: How much information is produced, put away, and handled inside the framework is addressed by this worth. The ascent in volume is because of an expansion in how much information is created and put away, as well as the need to use it [1, 6, 18].
2. Variety: The quantity of various kinds of information that a data framework can deal with is increased. Because of this increase, the quantity of connections and connection types between this information turns out to be more muddled. Unique information can be utilized in an assortment of ways, which mixes it up [1, 12].
3. Velocity: The term 'speed' alludes to how as often as possible information is made, gathered, and traded. The information is conveyed in a stream and should be analysed right away [1, 18].
Other important characteristics of big data are;
1. Variability: This is a variable that might cause issues for people who examine information. This alludes to the irregularity that information can show on occasion, impeding the course of productively taking care of and dealing with the information [6].
2. Veracity: The nature of the information gathered can change fundamentally. The legitimacy of the hidden information decides the accuracy of the investigation [6].
3. Complexity: Large information can be a troublesome errand, especially when huge measures of information are gathered from various sources. To get a handle on the data that should be communicated by this information, it should be connected, associated, and related. Accordingly, the 'intricacy' of Big Data has been begat [6].
V. TYPES OF BIG DATA
Now that we are on target with what is considered large information, we should examine the sorts of huge information:
a) Structured Data: It's information that is put away in a PC record, document, or data set's decent fields. Information that can be handily looked for up, handled, investigated, and distributed with the least vulnerability is delegated organized information. Item costs, client names, and postal codes are cases of organized information [16, 20].
b) Unstructured Data: This is information that doesn't squeeze into any of the decent fields in a record or document, or is hard to name. Sound and video records, photographs, and text-based information are instances of unstructured information (archives, diaries, messages and reports) [16, 20].
c) Semi-Structured: The third kind of colossal information is this. Semi-organized information alludes to the information that has both the organized and unstructured configurations expressed previously. More specifically, it alludes to information that, while not sorted under a specific vault (data set), has fundamental data or labels that different various pieces inside the information. We've presently arrived at the finish of the information sorts [20].
VI. BIG DATA ANALYTICS
Information that outperforms the capacity, handling and computational ability of conventional data sets and information examination approaches is alluded to as large information. Large Data as an asset requires the utilization of devices and methodology that might be utilized to investigate and separate examples from huge measures of information [1, 17, 18]. Large information examination is the most common way of gathering, handling, cleaning, and dissecting gigantic datasets to assist organizations with figuring out their information. The terms huge information examination, information science, business knowledge, and business investigation are undeniably used to depict the examination of monstrous informational indexes in organizations. Information science is characterized as a bunch of central ideas that energize the extraction of information and data from information [12, 21].
VII. CLASSIFICATIONS OF BIG DATA ANALYTICS
We must begin analysing the data when it has been collected. For various sorts of data, different forms of analytics should be applied. There are four different sorts of analytics [9, 15].
• Diagnostic: The goal of the diagnostic analysis is to figure out why something happened. The objective of the symptomatic investigation is to track down the wellspring of an issue. It's used to sort out why something happened and how it did. This approach tries to recognize and fathom the explanations behind events and ways of behaving [9, 21].
• Predictive: It establishes past data patterns and provides a list of possible solutions for a certain situation. The prescient examination looks at both current and authentic information to figure out what may happen from now on, giving likelihood. It takes advantage of your big data to predict data that we don't have. This is one of the most widely used analytical methods for sales lead scoring, social media, and customer service management data [1, 3, 6].
• Descriptive: It involves representing the request, "What is happening?" It's a stage in the information handling process that outcomes in a bunch of verifiable information. Information mining strategies sort out information and help in the disclosure of examples that give understanding. Clear examination gauges future probabilities and examples, giving a feeling of what could occur [1, 6, 21].
• Prescriptive: It involves offering the conversation starter, "What is probably going to happen?" It estimates what was to come given verifiable realities. Everything revolves around foreseeing what's to come. Prescient examination investigations current information and makes situations of what could happen to utilize an assortment of approaches, for example, information mining and man-made reasoning [1, 9].
VIII. BIG DATA TECHNOLOGIES
Big data is a platform to generate massive data gathered from the user’s communication using digital devices such as tablets, smartphones, and laptops [11]. With the numeric figures of data gathered particularly from the huge data amount, facts of statistics are being generated among the digital devices to extract the value of social networks [14]. Below are some of the technologies;
1. MapReduce: It's coding engineering created by Google in 2004 and executed in Hadoop for conveyed PC handling of huge measures of information. Information from unstructured record frameworks or organized data sets can be handled along these lines. A two-venture approach is reflected in MapReduce. A "map" step begins information handling by separating handling errands into subtasks and relegating those subtasks to different assets to wrap up. A "diminish" stage consolidates all completed subtasks into a solitary result and reports the discoveries [3, 16].
2. Apache Hadoop: Doug Cutting and Mike Cafarella made Hadoop an open-source programming structure in 2006. It was planned from the beginning to deal with incredibly colossal information assortments. The Hadoop Distributed File System (HDFS) and MapReduce are the two significant parts of Hadoop. Hadoop's stockpiling part is called HDFS. Hadoop partitions documents into enormous blocks and conveys them among hubs to store information. Hadoop's handling motor is MapReduce. Hadoop processes information by conveying code to hubs for equal handling [3, 8, 18, 22].
3. Apache Hive: It’s a Hadoop data processing engine that uses SQL. ETL jobs and SQL queries are processed in batches with Apache Hive. HiveQL is a query language used by Hive. HiveQL is based on SQL, however, it deviates from the SQL-92 standard in some ways [8].
4. Apache Pig: Pig is a key Apache project that sits on top of Hadoop and gives a more significant level of language to interfacing with Hadoop's MapReduce library. Pig gives a prearranging language to depicting cycles, for example, information perusing, separating and change, consolidating, and composing, which are similar cycles about which MapReduce was constructed. As opposed to communicating these cycles in many lines of Java code that straightforwardly utilizes MapReduce, Apache Piglets permits clients to communicate them in a language like slam or Perl scripts [3, 21].
5. Apache Spark: As a data analytics software, Apache Spark is rapidly gaining traction. It's an open-source cluster computing framework. Because it can analyse data up to 100 times quicker than Hadoop's MapReduce, Spark is commonly used as a replacement for Hadoop's MapReduce. Streaming information, AI, and intelligent examination are normal use cases for Apache Spark [3, 8].
6. NoSQL Databases: These are intended for big volumes of dynamic data that do not necessitate a relational data model. Continuous admittance to information pieces, for example, Twitter tweets, Internet server logs, and security keys is a typical use [8, 12, 18].
IX. BIG DATA ANALYTICS APPLICATIONS
Large information investigation is utilized in an assortment of businesses, including internet business, legislative issues, science and innovation, wellbeing, and taxpayer-supported organizations [5]. Organizations that are information-driven from an assortment of ventures exhibit the force of huge information by creating more precise forecasts and going with better choices [13, 15, 17]. By breaking down huge measures of information and uncovering stowed away examples, large information applications can help organizations in settling on better business choices. These informational collections could emerge out of web-based entertainment, sensor information, site logs, shopper criticism, etc. Large information applications are costing organizations a truckload of cash to find stowed away examples, obscure connections, market styles, buyer inclinations, and other helpful business information [6, 23, 24]. The following are some examples of domains where big data can be used:
i) Government: The integration of big data into government processes allows for cost savings, efficiencies in agriculture, aviation, e-commerce, network safety and knowledge, wrongdoing forecast and avoidance, efficiency, and assessment consistency, in addition to other things [6, 23].
ii) Manufacturing: Prescient assembling can assist with improving proficiency and permit more merchandise to be delivered while decreasing machine free time. For such enterprises, a lot of information is required. High-level estimating calculations utilize a precise way to deal with revealing significant data for these datasets [6, 19, 23]. Coming up next are probably the main advantages of involving enormous information applications in the assembling business:
o Superior product quality,
o supply planning,
o Defective tracking
o output prediction,
o energy efficiency,
o testing and modelling of novel production processes, and
o large-scale manufacturing customization.
iii) Healthcare: Customized medication and prescriptive examination, clinical gamble intercession and prescient investigation, waste and care changeability decrease, mechanized outer and inner revealing of patient information, normalized clinical wording and patient libraries, and divided point arrangements are instances of how enormous information investigation has figured out how to assist wellbeing with caring framework upgrade [6, 19, 23].
iv) Education: As indicated by a McKinsey Global Institute report, there is a deficit of 1.5 million profoundly qualified information experts and directors, and a few schools, similar to the University of Tennessee and the University of California, Berkeley, have sent off expert projects to fulfil this interest. To address this need, confidential training camps have sent off programs, including free projects like The Data Incubator and premium projects like General Assembly [6].
v) Internet of Things (IoT): The gadgets consistently create information and move it to a server every day. This information is mined to work with gadget interconnectivity. This planning can be involved by government associations as well as an assortment of organizations to work on their capacities. IoT is being utilized in savvy water system frameworks, traffic the executives, and group the board, in addition to other things [6, 23, 25],
vi) Media and Entertainment: New organizations are being utilized by the media and amusement areas to make, publicize, and disperse their substance. Clients hope to have the option to get to advanced content from any area and whenever. The rise of online TV shows, Netflix channels, and other comparable administrations exhibits that new clients are intrigued by survey TV as well as by getting information from any area. Media organizations ideal interest groups by guessing what individuals need to see, how to target adverts, how to adapt content, etc. By investigating watcher ways of behaving, enormous information instruments are working on the payment of such media organizations [6, 23].
X. WHY BIG DATA ANALYTICS IS NEEDED?
Huge information examination helps organizations in bridging their information and recognizing additional opportunities. Accordingly, more brilliant business choices, more successful activities, expanded benefits, and more pleasant customers are the outcome [15, 19, 22].
i) Decision-making is both quicker and better. Organizations can examine data promptly - and pursue choices given what they've realized - because of Hadoop's speed and in-memory examination, as well as the ability to dissect new wellsprings of information [6].
ii) New products and services. With the ability to utilize examination to gauge client prerequisites and fulfilment comes the possibility to give clients precisely what they need. As per Davenport, more associations are utilizing large information examinations to make new merchandise to satisfy the necessities of their clients [6].
iii) Cost reduction. With regards to putting away a lot of information, huge information innovations like Hadoop and cloud-based examination give massive expense reserve funds, as well as the capacity to reveal more successful approaches to carrying on with work [6].
iv) Big Data solutions: They're perfect for investigating semi-organized and unstructured information from an assortment of sources, as well as crude organized information [18].
v) v) When business measurements of information are not predefined, Big Data arrangements are proper for iterative and exploratory examination [18].
vi) Big Data is profoundly fit for taking care of data issues that don't fit perfectly into a standard social information base methodology to settle the main thing [18].
vii) Time Savings: Devices like Hadoop and in-memory examination can rapidly track down new wellsprings of information, permitting firms to dissect information and pursue quick choices given what they've realized [20].
XI. BIG DATA ANALYTICS CHALLENGES
The greatest impediments to associations embracing large information examination are more administrative and social than specialized, with the principal boundaries being an absence of comprehension of how to utilize huge information investigation to further develop business execution and an absence of the board ranging from clashing objectives [13, 22]. As per different industry studies, associations utilize not exactly 50% of their organized information in decision-making, while under 1% of their unstructured information is examined or taken advantage of, 70% of workers approach information they shouldn't, and experts invest 80% of their energy finding and getting ready information [13, 19, 24].
1. Leadership As per the board difficulties, undertakings that accomplish find success in the information-driven time have administration groups that decide points, tweak accomplishments and pose the right inquiries to be responded to by information experiences. Notwithstanding its mechanical methodology, the force of huge information can't be taken advantage of without vision or human knowledge. Thusly, heads of endeavours with a dream and the capacity to uncover future patterns and valuable open doors will act inventive and spur their groups to work proficiently to accomplish their objectives [21].
2. Talented management: A term alludes to the course of Human capital with an elevated degree of specialized abilities to involve and take advantage of these innovations to accomplish exploitable data for end clients, essentially the C-suite, which is expected for undertakings to use information through large information examination. Insights, enormous information mining, ace perception devices, a business-situated disposition, and AI are among the novel capacities these individuals have. These are vital for gaining important bits of knowledge from huge information and adding to decision-making [13]. In any case, these people (information researchers, information examiners, etc) are unquestionably rare, thus there is a gigantic interest in them. Finding information researchers who are talented in both examination and subject information is troublesome. By and large, there are fewer information researchers available than are required [24].
3. Decision-making Process and Quality. The nature of decision-making utilizing an information-driven system is a vital perspective in boosting the advantages of large information examinations. In this unique situation, components, for example, information quality from huge information sources, large information examination capacities, staff, and decision-creator quality are completely connected to decision-making quality [36]. The precision of large information sources is significant in offering high worth in decision making and keeping away from mistaken activities, while huge information examination abilities are connected to the utilization of the right philosophies and devices from enormous information examination specialists [24].
4. Data privacy. Many individuals view information assortment as profoundly suspect. Huge information is an encroachment on their security for them. Advertisers are battling with purchasers' view of information, as 71% accept that brands with admittance to their information are utilizing it unscrupulously, and 58% have not utilized any computerized administration because of protection concerns, which drive decisions about which applications to download, which email locations to share, and which online entertainment locales to interface with different sites. Thus, organizations should execute insurances to guarantee that information isn't taken advantage of to penetrate clients' protection [7]. Information strategies, including protection, security, licensed innovation, and risk issues, ought to be dealt with in this method for augmenting the capability of huge information [24].
5. Utilization of new technology. Numerous organizations that perceive the worth of information have created specialized abilities in business knowledge or potentially information warehousing, however, large information examination arrangements are exceptional and novel. Therefore, organizations should utilize existing methodologies and innovation to remove esteem from large information. Since these advancements are constantly developing, IT divisions ought to have the option to extend their abilities and remain current with progressing development. For instance, when data set programming doesn't empower huge information investigation, issues will emerge [24].
How to cite this article
Suleiman, Muhammad Muhammad and Aliyu, Muhammad Bello (2022). Application of
Big Data Analytics: Opportunities and Challenges. International Journal on Emerging Technologies, 13(1): 30–35.