Five Question Blitz: Deep Dive into Data

January 7, 2021

In this weeks Five Question Blitz we focus on data. Data is the foundation of Data & Analytics and it's important to have a sound understanding to build upon.

The Five Question Blitz was created to answer five questions relevant to Data & Analytics. Topics will be broad and answers will be simplified. Our goal is to promote common definitions and increase the general knowledge of individuals with interest in Data & Analytics.

What is Data?

Data, from a business perspective, is the transactional recording of the operations of the business. It is a log of actions and outcomes that requires structure and interpretation to be transformed into insight.

What are common data formats?

For the purpose of Data & Analytics, data formats are highly dependent on their origin and purpose. Origin dictates if the data is representative of an image, audio, text string, etc. Purpose dictates factors such as human readability, long term data storage, storage efficiency, process support, etc.

What are common data file formats?

Data file formats are the system by which data is organized. This organization includes factors such as human readability, process support, quality, and efficiency. Specific file formats are dependent on the type of data.

Text formats contain numbers and characters which are stored in file formats such as JSON (JavaScript Object Notation), CSV (Character Separated Values), or even basic TXT files.

There are a high number of image file formats. Each format includes factors that are optimized for its intended purpose. Image file formats include JPEG, BMP, PNG, SVG, BAT, etc.

Audio file formats are primarily segmented two ways, compressed and uncompressed. In many cases compression reduces quality but improves storage efficiency. Some common uncompressed formats include WAV and AIFF. Compressed formats include MP3, Opus, and Vorbis.

What is parsing?

Parsing is the process of extracting data stored within a file format. When parsing file formats that contain human readable data such as JSON or CSV, the resulting information typically follows a system. An example of this is how JSON uses attribute and value pair. This essentially identifies the name of a field and then declares what value populates that field. Another example, using a CSV file format, could contain data separated by commas. Each occurrence of a comma in the file designates the end and beginning of a field. Each line within the CSV represents a data record.

What are data storage systems?

Data storage systems hold the transactional logs of the business. What is most important to understand is how they store the data, how quickly they can return it, and how much data they can hold without issue. All of this is done through queries (typically some flavor of SQL) which may further limit or empower the user.

Join us next week to learn the difference between data storage, databases, data lakes, and data warehouses.

Click here to read Five Question Blitz: Data & Analytics.

About the Author: My name is Ion King and I am the Chief Executive Officer at SimDnA. My focus is on helping others passionate about growing careers in Data Science & Analytics achieve their goals. Connect with me on LinkedIn or find more of my articles on medium

Header image by Image by Gerd Altmann from Pixabay.

Begin your analytic journey today
Sign up free on the TradeCraft homepage
Join free today