
Yandex had a boatload of its supply code throughout all its know-how allegedly leaked by a disgruntled worker and a part of that was the supply code for Russia’s largest search engine – Yandex. As you possibly can think about, SEOs and others are diving in and seeing what they’ll study from the supply code.
I personally didn’t obtain the supply code, so I didn’t undergo it myself however I needed to share what folks did discover by way of Twitter from their investigations of the supply code.
This is the alpha model of an explorer instrument for the leaked #Yandex Search code.
It helps you to flick through the rating components, view by tags, and many others, and begin to discover connections.
Simple so as to add new options if there’s something you wish to see!https://t.co/AjbYnrDl9P pic.twitter.com/pQ4scOkP6w
— Rob Ousbey : @RobOusbey@mastodon.social (@RobOusbey) January 28, 2023
I downloaded the code, analyzed it and there’s a lot of helpful data for Google web optimization as nicely. pic.twitter.com/RWrgnnlpj6
— Alex Buraks (@alex_buraks) January 27, 2023
Theoretically, what’s the distinction between algorithms utilized in Google and in Yandex?
They’re fairly comparable:
– there’s RankBrain analogue – MatrixNet;
– they’re utilizing PageRank (virtually the identical as in Google);
– a whole lot of textual content algorithms are the identical. pic.twitter.com/Djjl8Bmjwn— Alex Buraks (@alex_buraks) January 27, 2023
In response to Statcounter Yandex is near Yahoo and Bing by market share: pic.twitter.com/5GKIvKIvAo
— Alex Buraks (@alex_buraks) January 27, 2023
Important insights after analysing this record:
#1 Age of hyperlinks is a rating issue. pic.twitter.com/U47uWvEq9w
— Alex Buraks (@alex_buraks) January 27, 2023
#3 Numbers in URLs is unhealthy for rankings pic.twitter.com/ECgwGeGUfb
— Alex Buraks (@alex_buraks) January 27, 2023
#5 Arduous pessimization equal PR=0 pic.twitter.com/RRbhuJyZr1
— Alex Buraks (@alex_buraks) January 27, 2023
#7 Enjoyable truth – there’s a separate rating issue for uplifting Wikipedia pic.twitter.com/799F8KFpkE
— Alex Buraks (@alex_buraks) January 27, 2023
#9 Doc age and final replace each are rating components. pic.twitter.com/ay1GTMVEtJ
— Alex Buraks (@alex_buraks) January 27, 2023
Proper now I checked ~40% of the record, there are much more (about textual content relevancy, behaivor components, web page rank, inner hyperlinks,and many others).
Will proceed this thread after a while.
— Alex Buraks (@alex_buraks) January 27, 2023
The primary thread bought a whole lot of impressions (500k views for the second, thanks for you retweets and likes!), so I made a decision to finalize.https://t.co/UQiQsnpWd2
— Alex Buraks (@alex_buraks) January 28, 2023
#2 Additionnaly: rating issue for orphan pages.
You possibly can straightforward discover them by way of Screming Frog or different crawlers. pic.twitter.com/zIPwAelpD0
— Alex Buraks (@alex_buraks) January 28, 2023
#4 Variety of search queries of your web site/url is a rating issue.
Clearly extra = higher. pic.twitter.com/xXQ6FMDghP
— Alex Buraks (@alex_buraks) January 28, 2023
#6 In case your url whould be the final for search session (person will discover what he wants) – it whould affect rankings.
There are strict components for this and predictible components as nicely. pic.twitter.com/Zx3sBZORCs
— Alex Buraks (@alex_buraks) January 28, 2023
#8 Particular rating components for brief movies (tiktok, shorts, reels) pic.twitter.com/oKPzL09MID
— Alex Buraks (@alex_buraks) January 28, 2023
#10 Key phrases in URL is a rating components.
As we are able to see from the outline – the optimum can be embody as much as 3 phrases from the search question. pic.twitter.com/Q1euKWSiST
— Alex Buraks (@alex_buraks) January 28, 2023

#14 Yet another rating issue for content material high quality – damaged embedded video on the web page.
Embed movies – good for rankings.
Damaged embed movies – unhealthy. pic.twitter.com/2SUys65PHp— Alex Buraks (@alex_buraks) January 28, 2023
#16 In the event you backlinks anchors include all phrases from the key phrases – it is good for web optimization.
Whether it is in a one hyperlink – it is extra useful. Particularly if the order of phrases is similar. pic.twitter.com/WrbESJ8Da5
— Alex Buraks (@alex_buraks) January 28, 2023
#18 The standard rank of texts on the area is a rating issue.
Pages with low high quality content material have an effect on your complete area. pic.twitter.com/MJUCTVB9CH
— Alex Buraks (@alex_buraks) January 28, 2023
#20 Humorous, there’s a random as a separate rating issue.
When you do not understant why a few of web page is on prime – it could possibly be simply random (to check behaivor components). pic.twitter.com/TGtzFrmBOV
— Alex Buraks (@alex_buraks) January 28, 2023
#22 Backlinks from the highest 100 finest web sites by PageRank impacts on rankings.
That is not information. pic.twitter.com/ikxldWLJqy
— Alex Buraks (@alex_buraks) January 28, 2023
Wow, I simply discovered the record with preliminary weights of Yandex rating components.
Do you want yet one more thread? 😁
P.S. last weights calculated by AI (matrixnet), however preliminary values are helpful as nicely. pic.twitter.com/WeroYQy7Yu
— Alex Buraks (@alex_buraks) January 28, 2023
That stated, I have been digging into the codebase myself to search out issues of curiosity.
I am doing this dwell, so I do not know the way lengthy it would take between tweets.
— Mic King (@iPullRank) January 27, 2023
A whole lot of the code associated to Yandex Search lives within the Kernel, ExtSearch, Search, and Robotic archives, however once more I will not be capable to be complete right here till I’ve appeared via all the pieces.
— Mic King (@iPullRank) January 27, 2023
Some actually fascinating issues within the web_meta_factors_info/factors_gen.in file because it pertains to content material options and components.
As an example, some issues that we might count on like a minimal expectation of the proximity of phrases in a title to the phrases within the question. pic.twitter.com/YRsrCpVsqU
— Mic King (@iPullRank) January 27, 2023
Curiously, there are a whole lot of scrapers in right here Google Information, Buying, YouTube and even different Yandex companies.
— Mic King (@iPullRank) January 27, 2023
Hmm…this may be the construction of how Yandex shops paperwork of their model of a doc server.
Nonetheless in search of an concept of how they construction their inverted index. pic.twitter.com/1lwTbOirnx
— Mic King (@iPullRank) January 27, 2023
This is a protobuf of hyperlink components. pic.twitter.com/1RM6o1xzRg
— Mic King (@iPullRank) January 27, 2023
Within the “hyperlink prioritizer code” they speak about lowering the precedence of hyperlinks with the identical textual content from the identical host. In different phrases, do not rely the hyperlinks from duplicate content material. pic.twitter.com/dQTUnScCUy
— Mic King (@iPullRank) January 27, 2023
How did y’all provide you with that variety of rating components?
I see 481 components simply associated to “Speedy Clicks” pic.twitter.com/sw5A3ia3Bk
— Mic King (@iPullRank) January 28, 2023
Just like the Googs, Yandex has a number of rating fashions to select from.
On this select_ranking_models.cpp file, they speak about having completely different fashions for various languages and areas. pic.twitter.com/m210tpOUDb
— Mic King (@iPullRank) January 28, 2023
I am gonna go watch TV, however I clearly have so as to add this to my e-book so I am gonna add extra over the subsequent couple days
— Mic King (@iPullRank) January 28, 2023
Been digging into how this robotic archive is structured.
It seems to be just like the Zora listing is the place a whole lot of fascinating issues are taking place. There is a limits.pb.txt file that shops the requests per second charge for the host and the IP deal with for 204k hosts. pic.twitter.com/0oulKm58dx
— Mic King (@iPullRank) January 28, 2023
This is the place the Doc and Question components are collected and scored.
Seems prefer it goes to storage after this tho. pic.twitter.com/qJAiLfSrsU
— Mic King (@iPullRank) January 29, 2023
Okay, actual fast, prime 5 most positively and negatively weighted rating components and their coefficients within the preliminary weighting in Yandex’s doc relevance calculation. Negatives first
#1 FI_ADV: -0.2509284637
This issue determines that there’s promoting on the positioning.
— Mic King (@iPullRank) January 29, 2023
#3 FI_QURL_STAT_POWER: -0.1943768768
Issue is the variety of URL impressions for the request
— Mic King (@iPullRank) January 29, 2023
#5 FI_GEO_CITY_URL_REGION_COUNTRY: -0.168645758
Issue is the geographical coincidence of the doc and the nation that the person searched from.
Okay, now for the highest 5 positively weighted components.
— Mic King (@iPullRank) January 29, 2023
Here’s a place to begin for hyperlink associated components.https://t.co/fwP8TxuOrM
— Christoph C. Cemper 🇺🇦 🧡 web optimization (@cemper) January 30, 2023
Will this enable you to do web optimization on Google? In all probability not however hey, it’s tremendous fascinating.
Ah, however as soon as they discover the optimum phrase rely …
BOOM
— John Mueller is watching out for Google+ 🐀 (@JohnMu) January 29, 2023
Discussion board dialogue at WebmasterWorld.