Android App Automation with Appium

2020-07-28 1921 words 10 minutes

Contents

Recently, when I was collecting WeChat official accounts, I extended a requirement. If I want to collect official accounts in batches, I must first obtain the list of official accounts I care about. The automation behind it is very complete, so I only need to obtain all The Chinese name of the official account, the whole process will be opened. But here comes the problem, the Chinese name is not easy to deal with. Or should I manually lose one by one? Pull down the list to see, ho, 346. I counted, one every 10 seconds, I have to mechanically type there for an hour. No, no, it’s too terrible.

Then look for an automated method? WeChat now has a monolithic interface on PC and mobile. For a tool boy like me, I can’t even open Fiddler. After all, people don’t use HTTP to communicate. From a communication point of view, there is definitely no way. Usually I would think of Frida/Xposed at this time, but now the target is WeChat, because the plug-in has already blocked an account before, and the one that needs to obtain data is a large account, so I dare not make mistakes. Therefore, we can only investigate other non-invasive methods. Looking around, it is safest to make a fuss on the UI. This article will briefly talk about how to use the application automation testing framework Appium ¹, in** non-root environment ** Obtain any data on any Android App interface under the circumstance.

Summary of Data Capture Methods

Readers will have doubts. We said earlier that the data on the App interface is obtained. This should be a crawler. Why did it go to the automated test? In fact, this is related to the internal mechanism of Android. What we’re doing now is controlling from within one process how another process runs. To do this, the Android system must provide API support. Android provides two components (at least the two ideas I thought of during the research) to meet this requirement, one is the Accessibility component for the disabled (for screen readers, voice assistants, etc.), and the other is for testing Dog-ready UIAutomator components (for unit testing).

As for the solution based on Accessibility, there is no ready-made one. After searching around, I have to write code. Since I write code, why don’t I write Python? On Reddit, there is a post that mentions that you can use Tasker with some miscellaneous plug-ins ² to read the content of the interface. According to the replies from netizens, there is a plug-in [^ac-ta] that can indeed read the interface text in . But I feel that the degree of completion is still relatively low. Using the garbage GUI of Tasker to complete the rest of the functions on the mobile phone… It’s quite a hip to think about it. And Tasker still needs money, even if it is done, it has no writing value, and the experience cannot be reused.

Based on UIAutomator, you can find a bunch of [^ua-ls] by searching casually, such as UIAutomator, Selendroid, Espresso, etc. Appium is not even ranked on the first page of Google, and I finally chose Appium just because I searched keywords 微信自动化 At the time, Appium appeared in one of the articles[^ua-zh]. Maybe everyone is as good as me, only writing Python.

There are two main reasons for choosing Appium, and I’m not ashamed to say it. First, the server (the host that controls the mobile phone) has a graphical interface [^appium-inst], download an exe and double-click to start it. The second is that the client (calling the API to control the mobile phone) can use Python scripts. Of course, the official supports Java, JS, Ruby, C#, PHP and other languages. The essence is to encapsulate the server REST API. See the document [^appium- doc] at a glance. Of course, it has other advantages. For example, Appium is the upper layer package of UIAutomator and Espresso. On the client side, you can use parameters to specify whether to use UIAutomator [^appium-ua] or Espresso [^appium-es] as the driver. Leave it to the reader to discover for themselves.

Dependency installation

Appium’s architecture is divided into server and client. The server is a stitching monster that runs on a computer (supports Win/Mac/Linux), is responsible for communicating with devices (such as Android phones, or emulators), and automates UI interfaces. Exposed through Web API. The server can be downloaded and decompressed from the official Github³. But Android debugging also needs to install additional dependencies.

My environment is Win10 and chocolatey ⁴ (a package manager) is installed, all dependencies can be installed with the following command.

To put it bluntly, it is the Android SDK, adb and JDK.

choco install AndroidStudio adb adoptopenjdk11

At the same time, it is necessary to ensure ANDROID_SDK_ROOT and JAVA_HOME The environment variable is set correctly.

ANDROID_SDK_ROOT=%USERPROFILE%\AppData\Local\Android\Sdk
JAVA_HOME=C:\Program Files\AdoptOpenJDK\jdk-11.0.8.10-hotspot

After the above steps are completed, start the Appium program and click the big blue button to start the service.

Write client script

Next, we will write a script to get the content in the list of WeChat official accounts. The interface of an Android application is also a tree structure at the data level, which is expressed in XML, just like the DOM tree we need to deal with when writing a web crawler, and now we want to obtain an application under a certain interface, a certain component The data in , you can also locate the corresponding components through xpath, and then extract the information we need from the instance. In Chromium we have developer tools, in Appium? We have too!

Now let’s launch Appium’s “Developer Tools”. The configuration of Appium is relatively obscure, mainly because it uses a lot of unknown nouns, such as startup configuration parameters, which are called Desired Capabilities here.

After we open a Session Window in the main interface, enter the following configuration parameters in the JSON Representation of Desired Capabilities and save them. Remember to replace the device id with adb devices The device ID displayed in the A very crucial point is that,** Do not miss under any circumstances ** noReset ** this parameter . If you do not add this parameter, the default will be every time you start the Session Clear all data of the target application **! The chat history of one of my WeChat IDs has been cleared just like that…

{
  "platformName": "Android",
  "deviceName": "YOUR_DEVICE_ID",
  "appPackage": "com.tencent.mm",
  "appActivity": ".ui.LauncherUI",
  "noReset": true
}

After configuring the parameters, click the big blue button to start. At this time, Appium will forcibly kill and restart the WeChat client, and then you can use the same method as Chrome developer tools to confirm the hierarchical structure of the target component by clicking with the mouse.

Taking WeChat as an example, what I selected here is the name of a public account, this component is a TextView, and its resource-id is com.tencent.mm:id/a71. After trying, I found that the IDs of all elements of the same type in the list are the same (for example, the resource-id fields of the tag “Aliyun” and the tag “Director Ao” are the same).

But obviously this is an obfuscated ID, and it should change with the update of the WeChat version, which is not elegant. We can use XPath to locate the location of this element, and then dynamically obtain its ID, so that we can use it without fear of WeChat updates after writing the script.

After some attempts, I found that it is most efficient to locate by knowing the official account name. In the end the expression I used was //android.widget.TextView[@text="阿里云"]. Get the element through this expression, and then get all the label fields in the visible range through the resource-id of this element.

But there is another difficulty, I mentioned** within sight ** , at most one page can be obtained each time, how to turn the page?

Here is a difficulty that I did not solve in the end. Because I ran into two problems.

One is the page-turning API ⁵, which seems to have not been implemented in the Python SDK (although it is written in the document), which is very sad. The final API is also different from the document, and also different from the selenium API posted in the document. Otherwise, why did you write it? ? If I can’t turn the page, I will simulate the click. As a result, there is a large delay between the execution of the simulated click steps. The effect I want to achieve is to press, drag, and let go. When the result is called, the interval between pressing and dragging After more than a second, the context menu of the long-press operation in the WeChat menu was triggered, which could not be solved anyway. I wanted to change the Driver, but I ran into problem 2 again.

The second is that the implementation of Espresso seems to be based on Instrumentation. When it starts, it takes half a day to compile a dedicated apk, but an error is reported when it runs, prompting that the instrumented application needs the same signature certificate as the source application. For us, this is of course impossible. If the signature certificate can be forged, I can write a fake WeChat. Of course, signature forgery can be achieved from the Xposed layer or by making some modifications in the framework, but my main mobile phone does not even have root, so I don’t have to worry about these things.

Therefore, the final compromise is that the program sleeps for two seconds after each recognition is completed, and then I manually drag the interface, which is still very low…

But in summary, the amount of code is still very small, and the condensed essence is only thirty or forty lines. After running the server, just run this python script.

import json
import time

import appium.webdriver
from appium.webdriver.common.touch_action import TouchAction

dc = dc_wechat = {
    "platformName": "Android",
    "deviceName": "DEVICE_ID",
    "appPackage": "com.tencent.mm",
    "appActivity": ".ui.LauncherUI",
    "noReset": True,
    "newCommandTimeout": 3600,
}

FIRST_ACCOUNT_NAME = "阿里云"

def main():
    driver = appium.webdriver.Remote("http://localhost:4723/wd/hub", dc)
    driver.implicitly_wait(3600)

    sample_element = driver.find_element_by_xpath(
        f'//android.widget.TextView[@text="{FIRST_ACCOUNT_NAME}"]')
    rid = sample_element.get_attribute("resourceId")  # 'com.tencent.mm:id/a71'

    accounts = set()
    prev_count = -1
    retry = 3

    while retry > 0:
        prev_count = len(accounts)
        elements = driver.find_elements_by_id(rid)
        for e in elements:
            accounts.add(e.text)

        if prev_count == len(accounts):
            retry -= 1
            print(f"about to stop, {retry}")
        else:
            print(f"retrieved {len(accounts) - prev_count} accounts")
            retry = 3
        time.sleep(2)

    print(list(accounts))
    with open("output.json", 'w') as f:
        json.dump(list(accounts), f)

The final result can be used

epilogue

Because the time from research to implementation is relatively short, the author is not very familiar with the realization principles of some functions in the article, that is, they can be used. As far as this is concerned, I still have to say that it took me a whole night to make this thing from beginning to end, it might as well be faster if I entered it one by one.

cover picture:

References

Appium: Mobile App Automation Made Awesome.https://appium.io/ ↩︎
Using Tasker to read text on a screen : taskerhttps://www.reddit.com/r/tasker/comments/99gheb/using_tasker_to_read_text_on_a_screen/[^ac-ta]: Task Assist - Run a “UI Query” easily on ANY screen to grab all its info for AutoInput. | AutoApps Forumshttps://forum.joaoapps.com/index.php?resources/task-assist-run-a-ui-query-easily-on-any-screen-to-grab-all-its-info-for-autoinput.293/[^ua-zh]: Talk about several solutions for WeChat automation-Knowledgehttps://zhuanlan.zhihu.com/p/109342914[^ua-ls]: Top 5 UI Frameworks For Android Automated Testing | Sauce Labshttps://saucelabs.com/blog/the-top-5-android-ui-frameworks-for-automated-testing[^appium-inst]: Installation via Desktop App Download - Getting Started - Appiumhttp://appium.io/docs/en/about-appium/getting-started/?lang=zh#installation-via-desktop-app-download[^appium-doc]: Status API - Appiumhttp://appium.io/docs/en/commands/status/[^appium-ua]: UIAutomator2 (Android) - Appiumhttp://appium.io/docs/en/drivers/android-uiautomator2/[^appium-es]: Espresso (Android) - Appiumhttp://appium.io/docs/en/drivers/android-espresso/ ↩︎
Releases appium/appium-desktophttps://github.com/appium/appium-desktop/releases ↩︎
Chocolatey Software | Chocolatey - The package manager for Windowshttps://chocolatey.org/ ↩︎
Scroll - Appiumhttp://appium.io/docs/en/commands/interactions/touch/scroll/ ↩︎