In this weeks Five Question Blitz we focus on data. Data is the foundation of Data & Analytics and it's important to have a sound understanding to build upon.
The Five Question Blitz was created to answer five questions relevant to Data & Analytics. Topics will be broad and answers will be simplified. Our goal is to promote common definitions and increase the general knowledge of individuals with interest in Data & Analytics.
Data, from a business perspective, is the transactional recording of the operations of the business. It is a log of actions and outcomes that requires structure and interpretation to be transformed into insight.
For the purpose of Data & Analytics, data formats are highly dependent on their origin and purpose. Origin dictates if the data is representative of an image, audio, text string, etc. Purpose dictates factors such as human readability, long term data storage, storage efficiency, process support, etc.
Data file formats are the system by which data is organized. This organization includes factors such as human readability, process support, quality, and efficiency. Specific file formats are dependent on the type of data.
There are a high number of image file formats. Each format includes factors that are optimized for its intended purpose. Image file formats include JPEG, BMP, PNG, SVG, BAT, etc.
Audio file formats are primarily segmented two ways, compressed and uncompressed. In many cases compression reduces quality but improves storage efficiency. Some common uncompressed formats include WAV and AIFF. Compressed formats include MP3, Opus, and Vorbis.
Parsing is the process of extracting data stored within a file format. When parsing file formats that contain human readable data such as JSON or CSV, the resulting information typically follows a system. An example of this is how JSON uses attribute and value pair. This essentially identifies the name of a field and then declares what value populates that field. Another example, using a CSV file format, could contain data separated by commas. Each occurrence of a comma in the file designates the end and beginning of a field. Each line within the CSV represents a data record.
Data storage systems hold the transactional logs of the business. What is most important to understand is how they store the data, how quickly they can return it, and how much data they can hold without issue. All of this is done through queries (typically some flavor of SQL) which may further limit or empower the user.
Join us next week to learn the difference between data storage, databases, data lakes, and data warehouses.
Click here to read Five Question Blitz: Data & Analytics.
About the Author: My name is Ion King and I am the Chief Executive Officer at SimDnA. My focus is on helping others passionate about growing careers in Data Science & Analytics achieve their goals. Connect with me on LinkedIn or find more of my articles on medium