如何使用Google的Indexing API ?

最近搞了一站点提交给sitemap给search console,结果一直显示无法读取此站点地图,sitemap多次核查是没有问题的,百度和bing等等收录正常,自己写了一个simemap生成器仍是如此。后来发现Google也支持API提交的就尝试搞了一下。

beautifulsoup4==4.11.1
bs4==0.0.1
cachetools==5.2.0
certifi==2022.6.15
charset-normalizer==2.1.0
google-api-core==2.8.2
google-api-python-client==2.55.0
google-auth-httplib2==0.1.0
google-auth==2.9.1
googleapis-common-protos==1.56.4
httplib2==0.20.4
idna==3.3
oauth2client==4.1.3
pip==21.3.1
protobuf==4.21.4
pyasn1-modules==0.2.8
pyasn1==0.4.8
pyparsing==3.0.9
requests==2.28.1
rsa==4.9
setuptools==60.2.0
six==1.16.0
soupsieve==2.3.2.post1
uritemplate==4.1.1
urllib3==1.26.11
wheel==0.37.1
  • 按照requirements.txt安装
pip install -r requirements.txt
  • 注意配置修改代码与自己相关的json验证文件和网站地图文件的路径

  • 代码如下

https://github.com/emperinter/GoogleIndexAPI

from oauth2client.service_account import ServiceAccountCredentials
import httplib2
import requests as req
from bs4 import BeautifulSoup

def index(url):
    SCOPES = [ "https://www.googleapis.com/auth/indexing" ]
    ENDPOINT = "https://indexing.googleapis.com/v3/urlNotifications:publish"

    # service_account_file.json is the private key that you created for your service account.
    JSON_KEY_FILE = "your-index-api.json"

    credentials = ServiceAccountCredentials.from_json_keyfile_name(JSON_KEY_FILE, scopes=SCOPES)

    http = credentials.authorize(httplib2.Http())

    # Define contents here as a JSON string.
    # This example shows a simple update request.
    # Other types of requests are described in the next step.

    content = "{\"url\": \"%s\", \"type\": \"URL_UPDATED\"}" % url

    response, content = http.request(ENDPOINT, method="POST", body=content)
    return response

all_link = []
origin_rul = 'https://your-sitemap-url.xml'
r = req.get(origin_rul)
bs = BeautifulSoup(r.content, 'html.parser') #解析网页
hyperlink = bs.find_all(name = 'loc')  # 标签是否要附加信息,如要附加。去BeautifulSoup查看文档,我目前测试过attrs={'alt' : ''}
for h in hyperlink:
    hh = h.string
    all_link.append(hh)

all_link.reverse()

sent = []

# 打开文件
fo = open("sent.txt", "r")
print("文件名为: ", fo.name)

for line in fo.readlines():  # 依次读取每行
    line = line.strip()  # 去掉每行头尾空白
    sent.append(line)  # 将每行的内容添加到列表中
    print("读取的数据为: %s" % (line))

# 关闭文件
fo.close()

for link in all_link:
    if link not in sent:
        print(link)
        res = index(link)
        if res.get("status") == "200":
            with open("sent.txt", 'a+') as f:
                f.write(str(link) + '\n')  # 加\n换行显示
        else:
            print(res)
            break
    else:
        print(str(link) + '已经发送过了')
        continue

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *