24 IM939 lab 5 - Exercise
The exercise this week is a chance to apply the code you have been delving into over these past week.
Choose one of the datasets from a previous week:
Crime Census
London Borough
Wine
Iris
Global warming
Or, if you are feeling confident, another dataset from the sklearn datasets.
note Please do refer to the sklean and pandas documentation if you get stuck.
Read in the data using pandas.
Look at the first few rows. Get a feel for the structure of the data.
Deal with missing values, if any.
Create a summary of the data. Plot any particular features or groups of features which you think are of interest.
Settle on a possible question you want to answer. What might you be able to learn from your dataset?
Decide on your initial analysis. Remember, we have covered:
- Linear regressions
- Dimension reduction
- Clustering
Which method will best allow you to tackle your question?
Apply your chosen analysis method below. Please do refer to and copy and paste code from previous weeks.
Can you visualise your result?
You may want to use a:
- Scatterplot
- Histogram
- Any other plot, such as those in the seaborn example library.
Are you able to check if your method is robust (e.g., kfold test of regressions or cluster stability checks)? Perhaps do that below.
Hmm, what have you learned?
You may want to consider if you could convince a friend of your conclusion. Perhaps another type of analysis is needed or there are issues with the analysis you chose above!
Use the space below to explore a bit more.