22 April 2022: “Data Science with Open Canada Data” Seminar
Dmitry Gorodnichy, dg@ivim.ca
Transcript
hello everyone
welcome to our webinars on using data science to analyze open canada data, specifically the data which relates to vaccines. this is the data which is of highest interest right now to many people, even those who have opposite opinions about their efficacy and their safety, and with the data we hope to bridge the divide between those two populations in our country, because data is just the data and we try to use data science techniques to make it more understandable. we build tools to visualize the data, to track the data over time.
so as usual i will start sharing my screen and we'll just go through the latest results which have been published this week by the public health agency of canada. okay so i'm sharing the screen right now, and in meanwhile please go everyone to the website ivim.ca.
ivim stands for independently verified information machine. essentially the idea is to automate the process of visualizing and validating the data. now you should be able to see the screen right now. please put your thumbs up so that i know you can see my screen.
also as usual we have a chat window, and i'm monitoring the chat window at the same time. so if there are any questions you can type your questions right here. also this video will be posted on youtube and also will be discussed on twitter, and you could pose your questions there as well.
so much has been done since last two weeks when we talked about the latest data from public heath agency of canada. we have rearranged entirely the dashboard which is on ivim.ca website. now as you will go there you would see a collection of time series data. time series is very important piece of information. in fact, in data analysis time series probably is the most important piece of evidence you could have. ideally, you should never just report the aggregates or the averages of any statistic of any measurement. you do need to see how the measurement - whatever measurement it is - changes over time, where it goes up, when it goes down, and then you can take the average of it, you can put a trendline,and you understand what's going on - the life behind the numbers, right? so, this is what we're trying to do here - we're trying to track all those different metric statistics over time, and that's exactly what the portal shows right now.
now you remember we have the page which lists all the data sources. i suggest we open this page at the same time: data sources and spreadsheet. it shows you again the official health agency of canada data sources which were using, listed right here . this is coveid daily epidemiology update which is published weekly, and once a week they provide update also on cases following vaccinations, which is very important because this is the only piece of evidence we have from post marketing - meaning after the vaccines have been already used, not before they were used or tested in the lab somewhere, right ? also in the same page, there is statistics related to variation by age: people dying from covid, what is the age, what is the average age of people who are dying from colvid, and we will open this web page - and you see i have already opened it on the screen , it's right right here. and as always we will just go - one by one - to all those tables and see what is happening, right ?
the other piece of information or source which we analyze relates to reported side effects following vaccinations. today we will not have time to talk about that, but again the link is right there, and the data from this source is also visualized on the dashboard. in fact, that's the first graph which you would see here - it's taken exactly from this report for side effects. and we will go one by one to all those graphs.
now we have added a few more references which we are not using yet, but we would like to use them as well. in fact, last time during our last seminar two weeks ago, one of the comments we have received from the audience was - can we correlate canadian data with data coming from other countries, let's say from uk or from new zealand? indeed, very very good point, because for example in canada they do not report data, vaccination data - cases by vaccination - divided into age groups, whereas in uk they do provide this information, and that's why we have included uk data right here as well, and i have the file already opened on the screen. so this is the file from official uk data, which is shown right now here, which is called vaccine weekly surveillance reports, and we have cached those reports as well and copied them on our local ivim.ca website. and one is right there.
and you would see a table which is very similar to the table which we have analyzed from canada. it shows the number of cases - just cases, then cases presented to emergency care, table 11, then table 12, covid death, within 28 days and within 60 days of being tested positive, divided by - very importantly - by this vaccination status, and you see: not vaccinated, then receiving one dose, receiving two dose, and three doses, but also categorized by age, you see. so this is very, very important.
in canada, they do not provide categorization by age. they just put all the statistics together, and that's why we're going now back into our canadian daily epidemiology update, and we click on this link 'Cases following vaccination'. it's table number two, and it is shown right now on your screen. you see, this is the table really. canadian table is much smaller than the table from the united kingdom, but it still shows very useful information: for unvaccinated versus fully vaccinated and fully vaccinated with additional dose, and you see how many deaths have happened in each of those categories. and you remember that what we do then we will just take this data - exactly as typed here - and we move it to our spreadsheet, which is shown right now on your screen - it's the last line, which is showing right now right here. so we just cut and face all the data from official website, so that we can see how those numbers change over time, right? and for example, you see that the percentage of fully vaccinated divided by total has increased from 27.9 to 28.4.
let's just check - that's exactly the number which is shown right now in here 17.5, plus 10.9 gives you exactly the percentage which is shown here. and then you can see how this percentage is growing with every week - from the very first week when this data has become available. and we plot it now here and we even put a trend line here. so it just goes steadily increasingly monotonically up, right? and we know why - remember we talked about that many many times that they count the cases before they were actually fully vaccinated in canada.
okay so we can go now to the dashboard, and if there are any questions, please don't hesitate to interrupt me, and i'm monitoring my chat here.
so what do we see here ? first, we show weekly - again, very importantly weekly - the dynamics of severe adverse reactions by age group, because these does provide information for three categories: from 5 to 11, from 12 to 17, and then for everyone else.
you would see we have added a thick yellow line which we call projected - based on the reporting delay - so what it is? green line is the actual number which is reported. let's say, right now - the last line- it's 45. so in the last week or the week of 25th of march, there were 45 serious events reported, but when we look projected it will be more than 45 it's 135.
so where this number comes from? it's obtained using a very simple machine learning technique - which learns from historical data. you just need to look into the report back in may and then into report six months later, let's say. and compare the numbers which are reported for the same month - for the same month of may - by two reports: reported the numbers reported back in may or in june and report it now, and you would see that indeed it takes for majority over the couple of cases, it takes about three - four months to report those events. so we understand that everything which is reported now it's not yet the final number for the month, for the week of april or march. these numbers will keep growing and in fact in about six or about three four weeks, months, we expect them just to go from 45 to 145, just based on the historical data.
now what are these serious diverse reactions ? many people may not understand in fact how serious they are. these are life threatening reactions or events, and it's very important to visualize them as well - how do they change over time. and that's why we added another plot here, which shows all of those major severe adverse reactions. that includes heart failures, liver failures, anaphylaxis, which is a very strong reaction of the body to the vaccine. so, and you can see how this changes over time. now the comment here is that we're using logarithmic scale here - so just that you have to be aware of that - in order to show you the small numbers, because the numbers for children, for example, is quite small compared to the overall number of events, but it's still it is there, and it's more than number of deaths from covid for the same population. so we show these in the following graph. so it shows cases deceased by age group, and you would see that for the age from 0 to 12 actually we have zeros for the entire duration since at least june. there was one reported there. again it takes a little bit of an understanding to better figure out how these plots are visualizing the data, but essentially if it's less than one it shows you that it's zero.
now about the other statistics - which is the most important statistics - which is how many deceased have happened between fully vaccinated versus unvaccinated? so these are the weakly numbers which obtained from the source file, and you see weekly average - it is shown on the vertical line. and so these are the numbers so these are total blue lines and then you see different lines here for zero doses, one doses and other number of doses. so very importantly if we recompute this for the ... for each population, because this is where you would know the effect of the vaccine in reducing the risk of dying from covid, and you would see that red bars here, which are showing the number of deaths among unvaccinated (zero doses) is higher than the average, so it means that zero doses are dying more likely. and one dose for other doses less likely. but then you would like to compare, okay, for how much ? and this is called the odds.
so you can compute the odds. and we show it right here - the odds of dying of covid if you have zero doses versus one dose. so again it's like a logarithmic scale which shows you whether the odds of dying is higher or lower for a particular population [compared to other population]. so let's understand what this table is showing us . so for cases deceased by vaccination status. it shows you that the red bar is actually - especially in the beginning, in june - it was about actually 13.12 times more likely to die of covid or - i do not say 'of covid', we say 'cases deceased', because that's how they're called in canada, because it could be of covid, it could be with covid, or it could be combined with other health problems and maybe triggered some of those other problems, or maybe it was just another cause, in addition to other causes. so and you see that the red bar then goes down. so it reduces. it means that the risk of dying for unvaccinated becomes less. so and actually as of now - on fourth of , actually third , of april - we see that for the red line, it's only 1.91. so it's twice more likely or more risky to die with covid or of covid if you are unvaccinated. you see - so it goes down with time, it reduces. okay that's good to know.
similarly you can look at the effect of the third vaccine - the booster. the booster here is shown as orange, this orange line here. and you would see - it's a comparison of being vaccinated with one dose versus three doses- and you see , it's lower than one. it means that the risk of dying with three doses, is actually more. so it's about ten times more, and here it's ~.17. so this number shows that a risk of dying with one dose is less by a factor of 0.17 compared to people who have three doses.
so you see, this is a very interesting dynamic. in the beginning this orange - you see in the beginning when you look at it - it was above one, right, when you compare even to zero doses - zero dose is yellow - you see it's above one. it's .. it shows you that the risk with three doses is 2.54 less than zero doses. but then it goes down down down and eventually it becomes less than one. so the so those seems like it is not as efficient after several weeks as it was in the beginning. okay. so that's how these graphs are plotted, and indeed they show you the dynamics of different parameters over time .
now let's see what else we have here. we also have this visual one graph which shows the average since 1st of february, and this graph is probably the most succinct way of describing this phenomenon which we have just observed. it shows exactly exactly what we have just discussed - that the highest risk of covid death is for people with zero doses, then - and you can compare it, right? so zero those versus one dose it's two point nine then it becomes not, as the difference is not as much when you compare zero those is too full of vaccinated and even less when you compare it to three doses but eventually what is most important here to observe is that one dose is actually less risky than two doses and less risky than three doses so what we observe in in here.
another interesting graphics here is the percentage of complications among covid cases by vaccination status. so if you already have a covid case, if you are infected with covid, what is the chance of you dying, right ? so and this is shown in this graph. so you would see that for zero doses it actually it remains about the same about one, one point one percent from all the time, pretty much, and just slightly less since january one percent so there is only one percent of all profit cases which are dying when you look at the population of unvaccinated.
now let's see what happens with the people who have been fully vaccinated meaning two plus doses so you see in the beginning there was a high percentage of them dying right and then it reduces over time. again we cannot conclude anything specific about the efficacy of vaccine, because vaccines are administered to a particular category of population which is most susceptible, susceptible to covid. so that's why you would in fact, you would, you cannot compare apples to oranges as they say, because indeed it's people who are elderly who have other health problems who are administered the vaccine, and vaccine is helping them, but they're still dying right ? but if you're healthy, if you don't have other health problems indeed it's very stable. you see that it's about one percent of people who are diagnosed with covid and then dying with covid or of covid, okay?
so this is pretty much what we wanted to show you today. now if you would have any questions, we could talk about those issues separately, but now i'll just go through the list of questions which we have received last time. and one of them was specifically about the definition of deaths. so we have created frequently asked questions on our seminar page and i will go with the right away - you can do it with me at the same time, and this is where we are posting slides recordings of these seminars and we have also added now frequently asked questions here and one of them is about covid death how they are defined.
we found that in canada they defined differently from the way they defined in united kingdom. so in the united kingdom the office of national statistics defines them as written right here: these are deaths where the patient died within 60 days of testing positive. but there is also confusion over there, because sometimes they say within 60 days but it also includes deaths where patient died within 28 day days. so we're really actually not sure whether it's 60 or 28, but what is very important that on the official website - on the their website where you remember we showed you this table from united kingdom - they do have two separate tables: one table shows you covid deaths within 28 days, and then within 60 days. so this becomes a very deterministic way of defining covid deaths. so, really if a person has a positive covid test, he will be added, his death will be added to this count. and you see there is one table (a) and one table (b) right here. so all these numbers were obtained based on the positive test result.
now in canadian data, when we look here, you would see that actually they define it in a slightly different manner, and we can see how they define it - we go to this 'Severe illnesses and outcomes' page, this is where they show variation by age, and you would see in the 'Text description' they actually they don't call it 'covid deaths', they call it the 'distribution of covid cases deceased'. so that there were covid cases and they were deceased. so that's why we now are using this definition for canadian data. so instead of saying 'covid deaths', it could be probably more correct to say 'covid cases deceased'. and this is why you would see again the numbers of 'covid cases deceased' by age. and again you see that the practically none young people who die, who are deceased with covid case, right ?
so then i looked specifically into the definition of these 'covid cases deceased' in canada. and it is posted right here. actually it is written on official canada website. it's "a probable or confirmed covid case whose death resulted from clinically compatible illness unless there is a clear alternative cause of death identified, for example, trauma, poisoning, drug overdose and so on". and then it also says that a medical officer or coroner may use their own discretion when determining if a death was due to covid or maybe due to something else.
so in canada we see it's on the discretion of a medical officer, right ? so because of that we really cannot state whether the person died directly from covid, with covid, maybe it was a contributing factor, but again just based on the fact that the average mortality rate for people who die of covid is about the same as the expected life expectancy life expectancy in canada. so from here we could kind of state that indeed it's other problems, health problems, which are most important, most likely, in the outcome of the covid case. okay, so we have answered this question, and if there are any, if there are any other questions, again you can ask them here [in chat], on youtube, or twitter.
Now you would see there is already ongoing discussion in twitter about what they have reported in united kingdom. actually, just yesterday night a colleague of mine from united kingdom sent me a link to the youtube and suggested to watch. it was very interesting to watch because it was shown on.. in uk, they call it, mainstream video or video channel, and it was very very interesting to see what they discuss in united kingdom.
so again at this point probably we will close the presentation and we will start the discussion...