Android App Automation with Appium

😗Translated Content😗

This article is machine translated which hasn’t been proofreaded by the author. The info it contains may be inaccurate. The author will do his best to get back (when he has time) and revise these articles. 🥰

For Chinese version of this article, see here.

Recently, when I was doing WeChat official account collection, I extended a requirement. If I want to collect official accounts in batches, I must first obtain a list of official accounts that I am concerned about. The automation behind is very complete, so I only need to obtain all official accounts. The Chinese name of the account, the whole process is over. But here comes the problem, the Chinese name is not easy to do. Why don’t I manually lose one by one? Pull down the list and take a look, ho, 346. I pinch my fingers, one for 10 seconds, I have to mechanically type and type there for an hour. No, no, it’s too deadly.

Then look for an automated way? WeChat now has a monolithic interface on the PC and mobile terminals. A tool kid like me can’t even catch a Fiddler. After all, people don’t use HTTP to communicate. From a communication point of view, there is definitely no way. Usually at this time I think of Frida/Xposed, but now the target is WeChat. Because the plug-in has already blocked an account before, and the one who needs to obtain data is a large one, I dare not make a mistake. So I can only investigate other non-intrusive methods. Looking around and making a fuss on the UI is the safest. This article will briefly talk about how to use the application automation testing framework Appium [^ home] to obtain arbitrary data on any Android App interface in a ** non-root environment **.

Summary of data scraping methods

Readers will have questions. Earlier, we said that we obtained the data on the App interface. This should be a crawler. How did it run to the automated test? In fact, this is related to the internal mechanism of Android. What we are doing now is to control how another process runs from within one process. To do this, Android must provide API support. Android provides two components (at least the two ideas that came to my mind during my research) to achieve this need, an Accessibility component for people with disabilities (for screen readers, voice assistants, etc.), and a test UIAutomator component for dogs (for unit testing).

What about the solution based on Accessibility? There is no ready-made solution. After searching for it, you have to write code. Since it is all about writing code, why don’t I write Python? On Reddit, there is a post that mentions that you can use Tasker with some miscellaneous plug-ins 1 to read the content of the interface. Looking at the replies of netizens, there is a plug-in [^ ac-ta] that can also read the text in the interface.. But I feel that the completion degree is still relatively low. I use Tasker’s garbage GUI to complete the rest of the functions on my mobile phone… Thinking about it, it’s quite a crotch. Moreover, Tasker still needs money. Even if it is done, it has no written value, and the experience cannot be reused.

Based on UIAutomator, you can find a bunch of [^ ua-ls] just by searching, such as UIAutomator, Selendroid, Espresso, etc. Appium didn’t even rank on the first page of Google, and finally chose Appium just because I When searching for the keyword < gt r = “5”/>, Appium appeared in one of the articles [^ ua-zh]. Maybe everyone is the same as me and can only write Python.

There are two reasons to choose Appium, and you are not afraid to be ashamed to say it. First, the server level (controlling the host of the mobile phone) has a graphical interface [^ appium-inst], download an exe and double-click to start it. Second, the client side (calling the API to control the mobile phone) can use Python scripts. Of course, the official also supports various languages such as Java, JS, Ruby, C #, PHP, etc., which are essentially encapsulations of the server REST API. See the document [^ appium-doc] for a glance. Of course, it has other advantages. For example, Appium is the upper-layer encapsulation of UIAutomator and Espresso. On the client side, you can use parameters to specify whether to use UIAutomator [^ appium-ua] or Espresso [^ appium-es] as the Driver. Leave it to the reader to explore on their own.

Dependency installation

Appium’s architecture is divided into server level and client side. Server level is a stitching monster that runs on a computer (supports Win/Mac/Linux), is responsible for communicating with devices (such as Android phones, or emulators), and communicates the UI automation interface through Web API exposed. The server level can be downloaded and unzipped from the official Github 2. But Android debugging also needs to install additional dependencies.

My environment is Win10 and chocolatey [^ choco] (a package manager) is installed, all dependencies can be installed with the following command.

To put it bluntly, it is the Android SDK, adb and JDK.

choco install AndroidStudio adb adoptopenjdk11

Also, make sure that the < gt r = “6”/> and < gt r = “7”/> environment variables are set correctly.

JAVA_HOME=C:\Program Files\AdoptOpenJDK\jdk-

After the above steps are completed, start the Appium program and click the big blue button to start the service.


Writing client side scripts

Now we are going to write a script to get the content in the WeChat official account list. The interface of an Android application is also a tree-like structure at the data level, which is represented by XML in implementation, just like the DOM tree we need to deal with when writing a web crawler, and we are now going to obtain the data in a certain interface and component of an application. We can also locate the corresponding component through xpath, and then extract the information we need from the instance. In Chromium we have developer tools, in Appium? We have it too!

Now let’s start Appium’s “developer tools”. Appium’s configuration is rather obscure, mainly because it makes a lot of unknown terms, such as startup configuration parameters, which are called Desired Capabilities here.


After we open a Session Window in the main interface, enter the following configuration parameters in the JSON Representation of Desired Capabilities and save it. Remember to replace the device id with the device ID displayed in < gt r = “10”/>. It is very important that ** do not miss ** < gt r = “11”/> < gt r = “12”/> under any circumstances. If this parameter is not added, by default ** all data of the target application will be cleared ** every time the Session is started! My WeChat account chat history was just cleaned up…

  "platformName": "Android",
  "deviceName": "YOUR_DEVICE_ID",
  "appPackage": "",
  "appActivity": ".ui.LauncherUI",
  "noReset": true

After configuring the parameters, click the big blue button to start. At this time, Appium will forcibly kill and restart the WeChat client side, and then you can use the mouse to confirm the hierarchy of the target component in the same way as Chrome developer tools.


Taking WeChat as an example, what I have selected here is the name of an official account. This component is a TextView, and its resource-id is < gt r = “14”/>. After trying it, I found that the IDs of all elements of the same type in the list are the same (for example, the resource-id field of the tag “Alibaba Cloud” and the tag “Director Ao” are the same).

But obviously this is a confused ID, and it should change with the WeChat version update, so it is not elegant. We can locate the position of this element through XPath, and then dynamically obtain its ID, so that we are not afraid of WeChat updates after writing the script, and it can be used as well.

After some experimentation, I found it most efficient to locate by a known official account name. The final expression I used was < gt r = “15”/>. The element is obtained through this expression, and then all label fields in the visible range can be obtained through the resource-id of this element.

But there is another difficulty. I mentioned < gt r = “16”/>. You can get at most one page per fetch. How to turn pages?

Here is a difficulty that I didn’t solve in the end. Because I ran into two problems.

One is that the API for page turning [^ appium-scroll] seems to be not implemented in the Python SDK (although it is written in the documentation), which is very bad. The final API is also different from the documentation, and it is also different from the selenium API posted in the documentation. What are you writing it for?? If I can’t turn the page, I will simulate a click. As a result, there is a large delay between the execution of the steps to simulate a click. The effect I want to achieve is to press, drag, and let go. When the result is called, the interval between pressing and dragging It took more than a second to trigger the context menu of the long-press operation in the WeChat menu, which could not be solved anyway. I wanted to change the Driver, but I ran into problem two again.

The second is that the implementation of Espresso seems to be based on Instrumentation. When starting, it takes half a day to compile a dedicated apk. As a result, an error is reported when running, prompting that the instrumented application needs the same signing certificate as the source application. Of course this is impossible for us. If the signing certificate can be forged, I can write a fake WeChat. Of course, signature forgery can be achieved from the Xposed layer or some modifications in the framework, but my main mobile phone doesn’t even have root, so I don’t toss about these.


So the final compromise is that the program sleeps for two seconds after each recognition is completed, and then I manually drag the interface, which is still very low…

But to sum up, the amount of code is still very small, and the condensed essence is only thirty or forty lines. After running the server level, just run the python script.

import json
import time

import appium.webdriver
from appium.webdriver.common.touch_action import TouchAction

dc = dc_wechat = {
    "platformName": "Android",
    "deviceName": "DEVICE_ID",
    "appPackage": "",
    "appActivity": ".ui.LauncherUI",
    "noReset": True,
    "newCommandTimeout": 3600,


def main():
    driver = appium.webdriver.Remote("http://localhost:4723/wd/hub", dc)

    sample_element = driver.find_element_by_xpath(
    rid = sample_element.get_attribute("resourceId")  # ''

    accounts = set()
    prev_count = -1
    retry = 3

    while retry > 0:
        prev_count = len(accounts)
        elements = driver.find_elements_by_id(rid)
        for e in elements:

        if prev_count == len(accounts):
            retry -= 1
            print(f"about to stop, {retry}")
            print(f"retrieved {len(accounts) - prev_count} accounts")
            retry = 3

    with open("output.json", 'w') as f:
        json.dump(list(accounts), f)


The end result can be used



Because the time from research to implementation is relatively small, the author does not know much about the implementation principle of some functions in this article, that is, it can be used. That’s it, I still have to say, it took a whole night to do this thing from start to finish, and it’s not as fast as I can input it one by one.

Cover image:



  1. Using Tasker to read text on a screen: tasker < gt r = “22”/> Task Assist - Run a “UI Query” easily on ANY screen to grab all its info for AutoInput. | AutoApps Forums < gt r = “23”/> Talk about several solutions for WeChat Automation - Zhihu < gt r = “24”/> Top 5 UI Frameworks For Android Automated Testing | Sauce Labs < gt r = “25”/> Installation via Desktop App Download - Getting Started - Appium < gt r = “26”/> Status API - Appium < gt r = “27”/> UIAutomator2 (Android) - Appium < gt r = “28”/> Espresso (Android) - Appium < gt r = “29”/> ↩︎

  2. Releases · appium/appium-desktop < gt r = “30”/> ↩︎