Hi,
Need support !! I am very new in data science and trying to play around with data.
Unfortunately, I do not have flexibility and authority to change the data.
May be the does not even apply as a proper use case but I need to make sure to try everything before I give up :( .
Can we predict the simultaneous occurrence of 2 events (Nab future value) based on past knowledge?
Given we have -:
Na - occurrences of A
Nb - occurrences of B
Nab - occurrences of A&B
Tags:
Prashant,
You will still need a baseline value, such as how many discrete intervals (trials) were examined to generate the counts, or in the case of continuous time, how long events were being counted relative to the window of simultaneity. For example, if you know that A or B must occur in every trial, then the baseline number of trials is N = Na + Nb – Nab.
If you have a baseline, you can make predictions in both discrete and continuous cases. Confidence bounds on the number of future simultaneous events Mab out of M future trials is based on a hyper-geometric distribution, given that you know the baseline N. The confidence level to reject the hypotheses that Mab >= m (hence to say Mab < m) is
Q = 1 – sum(n = 0 to Nab) of C(n+m, n) * C(N–n + M–m, N–n) / C(N+M, N),
where C represents the combinatoric function C(n,m) = n! / (m! (n–m)!).
Hope that helps.
Hi Bryan,
Thanks for your response. I understand, we need to identify a baseline. This is tricky since A or B are not bound to occur in every trial. Example-: In total of 10000 ( 90 second time interval) Na and Nb can be anywhere in the range ( average is 50 ).
I will still try and see if I can define a baseline or may be convince the team to switch to a new algorithm like APRIORI or FP Growth.
Regards //
Prashant
Bryan M. Gorman said:
Prashant,
You will still need a baseline value, such as how many discrete intervals (trials) were examined to generate the counts, or in the case of continuous time, how long events were being counted relative to the window of simultaneity. For example, if you know that A or B must occur in every trial, then the baseline number of trials is N = Na + Nb – Nab.
If you have a baseline, you can make predictions in both discrete and continuous cases. Confidence bounds on the number of future simultaneous events Mab out of M future trials is based on a hyper-geometric distribution, given that you know the baseline N. The confidence level to reject the hypotheses that Mab >= m (hence to say Mab < m) is
Q = 1 – sum(n = 0 to Nab) of C(n+m, n) * C(N–n + M–m, N–n) / C(N+M, N),
where C represents the combinatoric function C(n,m) = n! / (m! (n–m)!).
Hope that helps.
© 2019 Data Science Central ® Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service