Create a Discussion Post:
After a data analyst has reviewed a project and has completed an initial assessment of the data (such as the assessment we performed with Excel in Module One), data validation and data discovery tasks need to be performed.
Data validation steps include bringing the data into a tool such as RStudio and performing some preliminary validation to confirm that the data we are working with is valid.
In the data discovery stage, we explore data results that do not meet our expectations that we developed during our analysis of the data.
Describe a situation in which a data analyst might have a challenge validating data.
- Give an example of what you think may cause data to be invalid.
- Describe steps that you may take to confirm if a data set was valid or not.
- If data analysis is performed on invalid data, how could this affect a project?