Event Recording

An Analysis of Global Decentralized Identifier Data

Name: An Analysis of Global Decentralized Identifier Data
Uploaded: 2023-05-12T12:00:00+02:00
Duration: 18 min 35 s

Posted on May 12, 2023

Decentralized Identifiers (DIDs) offer a unique solution for digital identity verification, allowing individuals to have complete control over their own identity and eliminating the need for a centralized registry or authority. In this session, we will explore the insights that can be gained through the analysis of global DID data. At Danube Tech GmbH, we have developed version trackers that monitor various DID methods, such as did:indy, did:ebsi, did:ion and others, collecting and storing data on DID transactions in our database for analysis. During this session, we will present the results of our latest analyses, including trends in DID transactions over time, distributions across different verification methods, and errors found in DIDs and DID documents. This information can be valuable for businesses looking to understand and utilize DIDs in their operations, as well as for individuals seeking to use DIDs for their own digital identity management.

Show description

Speaker

Zaïda Rivai

Data Scientist
Danube Tech GmbH

Show Transcript

Hi everyone. Good to see you here in the last hours of the E I c. Hope you have a wonderful time so far. I'm Zda. I've been working as a data scientist in the last couple of years in both the public healthcare sector and a corporate and now den tech. We're based in Vienna. For those of you who don't know tech, we've been working on a lot of open source software. We are co-eds of the W three C disor specification. Marcus Abello has also been a pioneer in the field of digital identity and decentralized identity.

And the reason why I'm standing in front of you is because we provide easy interoperability for digital identity. So I'll talk about global decentralized identifier data, I'll call it global data for now. And I assume that you all know what A did is. Who doesn't know what A did is great.

Okay, so how do we get the data? Well, first of all, all DIDs that are being registered or let's say there are three possible did transactions, which is the creation of a did the update of a did and the deactivation of a did. So every time a did transaction happens is what we stored on the ledger or a blockchain. And now the experts, the experts here will tell me is iita. There are different ways to do this because you can also, I dunno, do it locally, right? But all the DIDs that are based on dis distributed layer technologies or blockchains are basically publicly visible.

So you can install your version tracker, you can store it in your own database and that's what we do because we provide easy interpretability. So what are we then looking at?

Well, there are different things you can look at. So for sure the date itself there is the document which has information about the verification methods such as what type has been used, E 2 5, 519 for example, or what is it, public keybase or context field. There are a lot of different things that can be written in or like can be read from the data. Another thing is the version time. So when was the date being created or in case of an update, when was the date being updated? This is just an example of all the things we can look at, but there is of course much more to see.

So the first thing you might be interested in, and maybe it's also where you're sitting here, are DIDs actually being used. That's the question you wanted to answer. So I'll show you now a graph which shows all the possible did transactions. So I talked about global data, which means all the transactions of the particular method of the particular network that are existing nowadays. So it's not a subset, it's not a part, it's it's all the possible transactions on, for example, did ion testnet. So we see about 250,000 data transactions.

Again, these are all dates that are either being updated, deactivated or created in case of did I in it's mostly creates create transactions. Didn't label all of them, but I will come back to this later on. So another thing we see for example that did indie method on the sovereign staging network has about was it 10,000?

Sorry, a hundred thousand debt transactions. So why am I showing this?

Well, there are several things you can read from this graph. So first the question, what I already mentioned are DIDs actually being used? Well here is one answer in the test network.

Second, if you are a wallet provider, for example, if you're an application provider, you might want to know what deep methods should I invest my time, money, and effort in? There are about 200 dip methods nowadays and you cannot support all of them, right? You need to make it like you need to make a decision. Which dip method am I going to support.

Third, if you are a method provider, I frazier from check, you might want to know how is my dip method performing in comparison to other dip methods, right? You need, you want to, you want to know how how you're doing. And fourth, what I'm interested in from a data science perspective you might want to know is activity on the test net. Could it be somehow an a predictor for future main net activity? So now keep looking at the red line I test net and I'll now show you all the dates on the main net.

So this is about one fifth of the original of the dates on the test network, like 50,000 debt transactions in total on the I test net. I don't have all the internal information here. So for example, it might be interesting to know what happens in, what is it, April, 2023? Maybe there was a release, maybe there was something. Is there anyone from Microsoft that can tell me what happened there? No one.

Okay, good boy. I also did, I did some, I ran some analysis on it was I think June when Microsoft intro got released 2022, but right, so to answer the question is current test net activity, could it be a predictor for future main activity? Qe? See some seasonality for that. We don't have enough data yet, but maybe in the future as days are being more and more used, hopefully we might could, we could use some predictions and some analyses.

So another thing I want to point out here is that I promise Global did data, but as you see and maybe the experts here know that did Ether to create a public private keeper. It happens locally. So you cannot really track that. So all the DID Ether transactions are either updates or deactivates. Then there are some other diss on the bottom, the indie software on the software network or that they checked, yeah, compared to ion it's a bit less from that I wanted to go into verification methods.

So as I show you in the beginning, verification method is something that is in a document and you can look at what kind of different cryptographic algorithms were being used. So to show you a bit of a boring figure, I would say a did indie ID union. So the did indie method on the ID union network. There's only one verification method, which is E 2 5 5 19. There are different motivations here, right? I personally I like it, it's solid, it works.

So having just one solid verification method is good, but maybe there are other people who might say, well it's maybe better to support some other verification methods as well because of security reasons. Now I'll show you another figure on the iron may net. So you might be surprised to see all these different verification methods, but if you look close enough, you see that, I mean what's this empty E D two? Is that okay? Sorry.

So the rounding, there's a bit of a rounding here, but there were actually people that used WA a wallet type as a verification method or they used an empty thing as a verification method. What does that mean?

Well, if you are, yeah, still working, if you are a method provider and you see that your method is being used like in a, in a wrong way, so people are inputting some wrong like verification method, it might be a problem. You might think, okay, I might have to change something, it might spec or I might have to change something because trying to use a did with a false verification method is, is problematic to my opinion from the perspective of an application provider.

Again, you might want to know, oh, we care a lot about verification and we want to to make sure that this happens in the correct way. If we now support iron may net, there are going to be some errors here. It might help you to make a decision on which did method you should support. Another thing is errors. So there's no such thing as gold standards for errors indeeds or did documents, but we identity tech we like, we did our own classification of errors and the thing is that if A did is on the blockchain, it can be resolved and it exists, right?

So these errors are not errors such that the DID doesn't exist or it couldn't be created. No, the DIDs are created, they can be resolved, but they basically mean that they're not conformed with the specification. So in case of a Jason LD error, the Jason LD needs to have this at context field. I showed you in the beginning in case of a version time error, it might be that the, the date was created in the future 2030 or 1989, which is not possible. So right to show your distribution here, there was we we, we made a distinction.

So all the DID transactions there was for example in the FC conformance did, there was a 75% of all the transactions that had an error and of those 75% there was a hundred percent that had a JS L D error. So all the DIDs didn't have this ad context field by default, they didn't have, they have this field. Another example here is the did indeed den, so the did indie method on Aden network, 12% had an error and of those 12% there was some variety.

So again, a lot of JS LD errors, but not only, we also saw a combination of JS LD errors and de referencing errors or verification method errors. As I showed you in a previous slide, I will come back to this later on, but the last thing I wanna show you from my personal graphs is duplicate keys. So D key is not necessarily an an issue, it's not an error, but it might be security problem. So you have to believe me that if I wrote down key three, there was the exact same key what was was seen.

So we see across different DIDs on the same method, but also across different D methods, we saw the same exact key being used. So these are all DS with an on the iron test net, right? This is again, not necessarily an error, but it could be a security issue. So DHS for example, didn't want to have the exact same key being used across different DITs. So you had to do, you have to do some sort of detection.

And again, from the perspective of a wallet provider or an application provider, you might want to, you might also don't want to have that or from the perspective of method provider, you see that duplicate keys are being detected, you might want to change something your spec, right? The same was seen on the epsy dip method, but also across actually all the other dip methods.

So, okay, now I, I show you all this graph, which is nicely done on my own personal laptop, but I also want to show you something more interesting because you can also visit this website, it's called stat com or scan my QR code later on. And me and a colleague at Denny Tech, we created a dashboard which is life every day it gets, sorry, every night it gets updated and you can basically see everything I already told you before. So we have stratified the data based on monthly data, based on yearly data and the, the last three months you can click on the different, what was it called?

Like methods. And you can opt in, opt out, different methods. What you can also do is you can go to a method specific page. So as you can see on the top right, we have different dip methods as well. So here is the ion dip method and we stratified or we have them stratified based on the number of dates that were created, but also in a number of dates that were updated and deactivated is basically an overview, which which is free. You can visit it and you can look at it yourself. Another thing we also did was compared really within the net within a network.

So in a top right corner you can select the ESnet are in May net or even other dip methods and you can compare the different networks with each other. So here I did on the, on the left, I, I merged all the dates and, and, and like some them from like in one month per week. And on the figure right from it you can see the same, but then I compare I, you see, yeah, I compare it to different networks per per week.

Another thing is right, this is also interesting, ION has like almost a hundred percent and a hundred percent error rate, which means that all the dates have an error, but you can probably guess most of them are again Jason LD errors. And last thing, what also showed you was the verification method variety. So there are some, there's a distribution of different error methods or verification methods. You can see it, you can go through the different methods yourself and you can check it.

So right, this is checked check mena check test and you can opt them in, opt them out to look at them a bit better. This week, this month there were not so many did checks created and right. It also has a hundred percent error rate. Frazier. What happened there? I have no idea de referencing errors a lot.

Oh, also love Jason Odis, right? So yeah, if you could not visit, then here's the QR code. Let me know if there is any questions and thank you very much for your attention.

Okay, go ahead. Thank you so much for presentation. Do we have any questions in the audience? Hi Andre for me, so what is your roadmap for including further verifiable data registries?

Like the, the networks that you mentioned on the, on the code platform? Do you have a roadmap as I, I think there's a growing ecosystem, right? So you probably want to have more, more of that.

Yeah, so we want to support multiple, so I think we're also working on different to support different deep methods as well on our pla So on our on our on this platform you mean, or on the godi platform? Yeah, I mean on your godi, how is, how is the roadmap that you want to grow other deep methods to to to monitor with your, with your, with your technology to pro to check the errors? So I mean this is very useful because everyone always assumes this is kind of the blockchain world.

Everything is safe and, and great and there are no errors there or this is very reliable, but you are no pointing out that they're the errors and, and obviously I think people are eager to understand how do you match against the, the errors. So how do you find out their errors?

Okay, and what's the, what's the roadmap, what you wanna do on, on onboarding further verifiable data registries on this? Because I think it's extremely useful to look at that and that's, that's kind of my thoughts on that.

Yeah, Yeah. So if you're a method provider, you either come to us and you tell, Hey, I want my dip method to be added to your platform and then we can look at it. We're also working on it ourselves currently. So to add multiple methods to, to your platform and and install version trackers for that. And the second question was errors, right? How do we track it or what is the, Yeah, well how do you, how do you, what's this, what's the, the logic that you apply to check for errors?

Because obviously you have to have, look at the spec and then compare all the records against the spec and then Yeah, Yeah, yeah. That's basically how we do it.

So we, yeah, we, we check what the, how does, so this probably will also be updated the spec, right? So we have to update our way of defining it, but now we, we look at the spec and we see, okay, if it's not confirmed with the spec, it's this type of error or that's how we currently classify our errors, right? Yeah. Okay. Thanks. Makes sense. Yeah. I have a question. As a data scientist, do you see from your, your knowledge in the field that the values of privacy are preserved?

Meaning if I'm using verifiable credentials, if my public case on a method, do you see from your knowledge and ability to correlate my activity and transactions? Or is it preserved?

Sorry, I didn't understand your question. If I see from my perspective as a data scientist, the Type of of data that you interact with, do you see the potential to correlate private individuals activity?

Like, oh, you traveled here, you shop there, or do you see that the implementations preserve the privacy and prevent this correlation and tracking? Well, for DS it's basically you don't want to have any privacy sensitive data on the chain, right? So in the ideal case, you don't do that.

So that, oh, okay. Yeah.

So that, that, like that, that will be an issue. So if you just, if, if the DIP methods is being used correctly and you put the right data on the chain, I, I think that's a safe way to do it.

So second, there's also a lot of different dip methods. So you can also choose for non ledger based or blockchain based technologies so that there you can basically choose your privacy yourself. Is that answering your question or is that Yeah, thank You.

I haven't, I have another one that's directly related to that. So actually it would be, you're monitoring currently only dates obviously on the indie letters you have other data objects as well which could be useful to be monitored and particularly in the compliance violation scenario that you can basically find out there could be a potential GDPR which issue with some of the data records that are entered in there. Because this is obviously a key concern of everyone operating the lecturers.

So just the thoughts on that, I mean this is obviously not, not comparable exactly to a spec, but you could basically factor in some kind of heuristic, say, okay, there's this looks like a name of some person and it could be the GDPR violation. So a red Flag.

Yeah, a good one to to to add to the error. Sorry, to add to the error detection, right. To have it, to have that as well. If we see somehow privacy sensitive data, then it should be detected. Yeah for just like we are not the the dip method creators, but then it should, I would say from a dip method you should, you should have a look at that. We can of course detect it, analyze it, we cannot change the method.

But yeah, that's a good, good topic. Do we have any more questions? We have one minute left.

Okay, I think that's it then. We are right on time. Thank you so much Zda, for your presentation.

Like this?

Don't like this?

An Analysis of Global Decentralized Identifier Data