Unit 4: Managing data resources

First Semester Management Information Systems Notes

Introduction:

Managing data resources is an important task for any organization that relies on data to make informed decisions. Effective management of data resources can help ensure that the data is accurate, complete, and up-to-date, and that it is easily accessible to those who need it. Here are some key steps in managing data resources:

  1. Data collection: The first step in managing data resources is to collect the data. Data can come from a variety of sources, including surveys, customer transactions, and social media. It’s important to ensure that the data is collected in a structured and consistent manner, to ensure that it can be easily analyzed later.

  2. Data storage: Once the data is collected, it needs to be stored in a way that is secure and easily accessible. This can involve using a database management system (DBMS) to store the data, or using cloud-based storage solutions.

  3. Data cleaning: Data cleaning involves identifying and correcting errors in the data, such as missing values, duplicates, and inconsistencies. This step is important to ensure that the data is accurate and can be used effectively.

  4. Data analysis: After the data is cleaned, it can be analyzed to identify patterns and trends. This can involve using statistical analysis, data mining techniques, or machine learning algorithms.

  5. Data visualization: Data visualization involves presenting the data in a way that is easy to understand and interpret. This can involve using charts, graphs, and other visual aids to highlight key insights.

  6. Data governance: Data governance involves establishing policies and procedures for managing data, such as data security and privacy policies. This step is important to ensure that the data is used ethically and responsibly.

Fundamental of Data concepts

  1. Definition of Data: Data refers to facts, figures, statistics, and other pieces of information that can be collected, organized, and analyzed to gain insights and make informed decisions. Data can come from a variety of sources, including surveys, customer transactions, and social media.

  2. Types of Data: There are two main types of data: quantitative and qualitative. Quantitative data is numerical in nature and can be measured or counted, while qualitative data is descriptive in nature and cannot be measured or counted. For example, the number of sales made by a company is quantitative data, while customer feedback about the quality of products is qualitative data.

  3. Data Quality: Data quality refers to the accuracy, completeness, consistency, and reliability of data. Poor data quality can lead to incorrect analysis and decision-making. Data quality can be affected by a variety of factors, including errors in data entry, missing data, and inconsistencies in data formatting.

  4. Data Management: Data management refers to the process of collecting, storing, organizing, and maintaining data. This includes tasks such as data cleaning, data integration, and data governance. Effective data management is important to ensure that data is accurate, accessible, and secure.

  5. Data Analytics: Data analytics refers to the process of analyzing data to gain insights and make informed decisions. This can involve using statistical analysis, data mining techniques, or machine learning algorithms. Data analytics can help organizations identify patterns and trends in data, and make data-driven decisions that improve business outcomes.

  6. Big Data: Big data refers to the large and complex data sets that are generated by modern technologies, such as social media and the internet of things (IoT). Big data poses unique challenges for data management and analytics, as traditional methods may not be sufficient to process and analyze such large volumes of data. New technologies and techniques, such as distributed computing and machine learning, are often required to manage and analyze big data.

  7. Data Privacy and Security: Data privacy and security refer to the protection of sensitive data from unauthorized access, use, or disclosure. This is an important consideration in data management and analytics, as data breaches can have serious consequences for individuals and organizations. Data privacy and security can be addressed through a variety of measures, including encryption, access controls, and data governance policies.

Database management

Database management refers to the process of organizing, storing, retrieving, and maintaining data in a structured and secure manner. Databases are used to store large volumes of data in a way that allows for efficient retrieval and manipulation of information. Effective database management is crucial for organizations to ensure that they have accurate and up-to-date data that can be used to make informed decisions.

The following are some key aspects of database management:

  1. Database Design: Database design is the process of creating a logical and physical model of a database. This includes identifying the entities (such as customers, orders, and products) and their relationships, as well as defining the attributes (such as name, address, and price). A well-designed database can improve data accuracy, reduce data redundancy, and enhance data security.

  2. Data Modeling: Data modeling is the process of creating a visual representation of data structures and relationships. This includes using diagrams such as entity-relationship diagrams (ERDs) to illustrate the relationships between entities and their attributes. Data modeling helps to ensure that the database is structured in a way that meets the organization’s needs and supports its business processes.

  3. Database Administration: Database administration involves managing and maintaining the database, including tasks such as database backup and recovery, database security, and performance tuning. Database administrators (DBAs) are responsible for ensuring that the database runs smoothly and efficiently, and that data is secure and available when needed.

  4. Data Retrieval and Manipulation: Data retrieval and manipulation refers to the process of accessing and manipulating data stored in a database. This is done using structured query language (SQL), which is a programming language used to manage and manipulate data in relational databases. SQL allows users to retrieve, update, and delete data, as well as perform complex queries and aggregations.

  5. Data Security: Data security is a critical aspect of database management, as databases can contain sensitive and confidential information. This includes implementing access controls to ensure that only authorized users can access the data, as well as encrypting data to protect it from unauthorized access.

Types of databases (Centralized and distributed) 

 

Databases can be classified into different types based on their structure and location. Two of the most common types of databases are centralized and distributed databases.

  1. Centralized Databases: A centralized database is a type of database in which all the data is stored in a single location or server. It is controlled by a central authority and can be accessed by authorized users from different locations. In a centralized database, all data processing tasks are performed by a single computer, which can make it easier to maintain and manage the data. This type of database is best suited for small to medium-sized organizations that have a limited number of users and a single location for data storage.

Advantages of centralized databases:

  • Data is stored in a single location, making it easier to maintain and manage
  • Centralized databases can be more secure as access to data can be tightly controlled
  • Backup and recovery can be managed more easily

Disadvantages of centralized databases:

  • Single point of failure: If the centralized server fails, the entire database becomes unavailable
  • Scalability can be an issue as the system may not be able to handle a large amount of data or a large number of users
  • Network congestion can affect performance as all users access the database through a single server
  1. Distributed Databases: A distributed database is a type of database in which data is stored in multiple locations or servers. Each server has a portion of the data and can perform data processing tasks independently. The servers are connected by a network, and users can access the database from any location. In a distributed database, data can be replicated across multiple servers to ensure availability and reliability. This type of database is best suited for large organizations with a high volume of data and a large number of users.

Advantages of distributed databases:

  • Improved availability and reliability as data is stored on multiple servers
  • Better scalability as new servers can be added to handle additional data and users
  • Improved performance as data can be accessed from the server closest to the user

Disadvantages of distributed databases:

  • Complexity: managing and coordinating data across multiple servers can be complex and requires specialized skills
  • Security can be a challenge as data is stored in multiple locations and needs to be protected from unauthorized access
  • Network latency can affect performance if data needs to be retrieved from a remote server

Leave a Reply

Your email address will not be published. Required fields are marked *