You identify a target behaviour, in our case, it is a specific customer action, like filing a complaint/not filing a complaint, accepting or rejecting an upsell offer, cancelling / not cancelling a service.
You establish a list of predictors which you think might have an impact on the aforementioned behaviour. There is no hard rule here but the number of predictors can vary greatly from one model to another. It can go from a few dozens to a few hundreds just to give a ball park figure. The important thing here is that each predictor should be independent from one another. Typical predictors might be the number of customer touch point in the past year, the subscription age, etc.
You build a training set - basically you associate each past behaviour exhibited by the clients with the list of associated predictor values (the observations - for example a client who resigned his contract last month had 2 touch points and was a client since 2014.) The higher the number of observation, the better. Again, no hard rule, but a number of observation in the hundreds of thousands is quite common.
You pick an algorithm which will use the training set to create a relationship (the model) between the target behaviour and the predictors by analyzing every observation. The model can be linear with algos like linear regressions or logistic regressions, or non-linear (like tree-based algos, neural networks, etc.) The algorithm choice is ultimately based on the data scientist talent and his belief the relationship between behaviour and predictors might be. The trade off between linear and non-linear is basically the parametrization compexity vs applicable use cases trade off. Non-linear's are more adapted in more cases, but often require heavy tweaking before yielding satisfying results.
You use a subset of your training set (the validation set on which you didn't train your model) to test the validity of your model and make sure that the predicted outcome is not too far away from the observed outcome
You apply your model to the test set, which will predict the behaviour for customers given a set of measured predictors. The outcome can be a boolean flag (yes / no) or an occurrence probability (There is 85% likelihood the customer will churn). Furthermore, some algorithms will help you identify which of the predictors carry a strong weight in the prediction (the root causes) and which ones can be discarded.