It contains a complete database of such a fictitious shop. We assume that we want an answer to the question Which products from the available assortment are the most valuable for customers and why? . Such answers can potentially help us communicate with customers define a marketing strategy find gaps in business assumptions build forecasting models and much more. Business analytics can have many uses. Action plan Setting up the environment and loading the necessary libraries. Uploading datasets. Data preparation. Determination of product clusters bas on the value of their sales. Building a model that will be able to match the products from the database to the clusters defin by us on the basis of features. Results review.
Environment configuration As mention above we can use Google Collab resources to build and run our notebook. import pandas as pd import matplotlib.pyplot as plt from sklearn.modelselection import traintestsplit from sklearn.cluster import KMeans from sklearn.metrics Taiwan WhatsApp Number List import accuracyscore f score from sklearn.metrics import confusionmatrix ConfusionMatrixDisplay InterpretML Libraries !pip install interpret from interpret import setvisualizeprovider from interpret.provider import InlineProvider from interpret.glassbox import.
ExplainableBoostingClassifier from interpret import show fix se parameter for pseudo random methods gives us prictability when using them RANDOMSE We will mainly use the Pandas library to work with data sets the Scikit learn package one of the most popular machine learning libraries in Python and InterpretML to build the final model this is the only library that nes to be install because it is not present in GC by default . Uploading datasets dfretail pd.readcsv sales receipts. dfretail.head Transaction data for one month dfproducts pd.readcsv product.csv dfproducts.head Product data As a reminder the above files are from Kaggle . Data preparation In the case of any analytical work the most important thing is the review and then the proper preparation of the data we work with. There can be many pitfalls along the way relat to the quality of the data it can be incomplete in the wrong format contain errors.