公司有的小伙伴问我,为什么不推荐我们使用 nested 结构呢,还说性能低。那么,ES 针对 nested 之类的结构。因为ES 源码我也基本看完了。索性,直接写成笔记。比直接在代码里面写注释来的更舒心点。
代码具体入口 org.elasticsearch.index.shard.IndexShard#prepareIndex
public static Engine.Index prepareIndex(DocumentMapperForType docMapper, SourceToParse source, long seqNo,
long primaryTerm, long version, VersionType versionType, Engine.Operation.Origin origin,
long autoGeneratedIdTimestamp, boolean isRetry,
long ifSeqNo, long ifPrimaryTerm) {
long startTime = System.nanoTime();
// 涉及到 nested 等等结构的转换,直接看【2.2 类型具体转换代码】
ParsedDocument doc = docMapper.getDocumentMapper().parse(source);
// Mapping 是否要处理
if (docMapper.getMapping() != null) {
doc.addDynamicMappingsUpdate(docMapper.getMapping());
}
// _id 转 uid。这里是为了数据能保持整齐,方便压缩。可以参考 【哈夫曼编码】。
Term uid = new Term(IdFieldMapper.NAME, Uid.encodeId(doc.id()));
return new Engine.Index(uid, doc, seqNo, primaryTerm, version, versionType, origin, startTime, autoGeneratedIdTimestamp, isRetry,
ifSeqNo, ifPrimaryTerm);
}
/**
* 内部转换文档,如果有 nested 结构,需要再次转换一下
* @param mapping
* @param context
* @param parser
* @throws IOException
*/
private static void internalParseDocument(Mapping mapping, MetadataFieldMapper[] metadataFieldsMappers,
ParseContext context, XContentParser parser) throws IOException {
final boolean emptyDoc = isEmptyDoc(mapping, parser);
/**
* 预处理,为 root document 拆开,添加如下:比如,_id、_version 也是一个 document,具体看下面的 【2.3 支持 _id 之类的字段】
*/
for (MetadataFieldMapper metadataMapper : metadataFieldsMappers) {
metadataMapper.preParse(context);
}
if (mapping.root.isEnabled() == false) {
// entire type is disabled
parser.skipChildren();
} else if (emptyDoc == false) {
// 转换对象或者 nested 结构,这个方法会反复递归调用。主要是 object 结构或者 nested 结构
parseObjectOrNested(context, mapping.root);
}
// 为各个非 root document 添加 _version 等等字段
for (MetadataFieldMapper metadataMapper : metadataFieldsMappers) {
metadataMapper.postParse(context);
}
}
代码位置:org.elasticsearch.index.mapper.MetadataFieldMapper#preParse
下面只贴出 _id 的处理
/**
* _id 也是一个 doc
* @param context
*/
@Override
public void preParse(ParseContext context) {
BytesRef id = Uid.encodeId(context.sourceToParse().id());
context.doc().add(new Field(NAME, id, Defaults.FIELD_TYPE));
}
这里只是了其中的一个例子:_id ,其他的比如 _version、_seqno、_source 等等处理也类似。
ES 在转换 nested 结构的时候,比较有意思。
/**
* 转换 object 或者 nested 结构的,这里会出现递归调用,主要是为了解决 object、nested 结构
* @param context
* @param mapper
* @throws IOException
*/
static void parseObjectOrNested(ParseContext context, ObjectMapper mapper) throws IOException {
if (mapper.isEnabled() == false) {
context.parser().skipChildren();
return;
}
XContentParser parser = context.parser();
XContentParser.Token token = parser.currentToken();
if (token == XContentParser.Token.VALUE_NULL) {
// the object is null ("obj1" : null), simply bail
return;
}
String currentFieldName = parser.currentName();
if (token.isValue()) {
throw new MapperParsingException("object mapping for [" + mapper.name() + "] tried to parse field [" + currentFieldName
+ "] as object, but found a concrete value");
}
ObjectMapper.Nested nested = mapper.nested();
// 如果是 nested 结构,每次都会new 一个空白的 document ,而且,这个方法 #{innerParseObject},是递归实现,把 object 或者 document 变成多个 document
if (nested.isNested()) {
// 进入下方的:【2.4.2 nested 转换初步入口】
context = nestedContext(context, mapper);
}
// if we are at the end of the previous object, advance
if (token == XContentParser.Token.END_OBJECT) {
token = parser.nextToken();
}
if (token == XContentParser.Token.START_OBJECT) {
// if we are just starting an OBJECT, advance, this is the object we are parsing, we need the name first
token = parser.nextToken();
}
// 转换对象
innerParseObject(context, mapper, parser, currentFieldName, token);
// restore the enable path flag
if (nested.isNested()) {
nested(context, nested);
}
}
/**
* 内部转换 nested 结构,生成一个空白的 nested 结构
* TODO nested 文档的 _id 既然跟父文档的一样,lucene 写入每个 doc ,都是拼接。那么,在get 的时候,自然会获取到相同的 _id 多个文档,包含了 nested 结构。然后,再内部转换为我们 最想要的结果。
* @param context
* @param mapper
* @return
*/
private static ParseContext nestedContext(ParseContext context, ObjectMapper mapper) {
// 创建 nested 上下文,并且,new 一个空白的 document。为后面的 nested 的字段或者对象之类的,全部加上
context = context.createNestedContext(mapper.fullPath());
ParseContext.Document nestedDoc = context.doc();
ParseContext.Document parentDoc = nestedDoc.getParent();
// We need to add the uid or id to this nested Lucene document too,
// If we do not do this then when a document gets deleted only the root Lucene document gets deleted and
// not the nested Lucene documents! Besides the fact that we would have zombie Lucene documents, the ordering of
// documents inside the Lucene index (document blocks) will be incorrect, as nested documents of different root
// documents are then aligned with other root documents. This will lead tothe nested query, sorting, aggregations
// and inner hits to fail or yield incorrect results.
IndexableField idField = parentDoc.getField(IdFieldMapper.NAME);
if (idField != null) {
// We just need to store the id as indexed field, so that IndexWriter#deleteDocuments(term) can then
// delete it when the root document is deleted too.
nestedDoc.add(new Field(IdFieldMapper.NAME, idField.binaryValue(), IdFieldMapper.Defaults.NESTED_FIELD_TYPE));
} else {
throw new IllegalStateException("The root document of a nested document should have an _id field");
}
// the type of the nested doc starts with __, so we can identify that its a nested one in filters
// note, we don't prefix it with the type of the doc since it allows us to execute a nested query
// across types (for example, with similar nested objects)
nestedDoc.add(new Field(TypeFieldMapper.NAME, mapper.nestedTypePathAsString(), TypeFieldMapper.Defaults.NESTED_FIELD_TYPE));
return context;
}
仔细看看里面的英文。主要的一点是:nested 结构的 _id 和 parent 的 _id 保持一致。那么,通过 GET docId 这种操作,就可以拿到所有的文档了。而且,删除的时候,特别的方便。算是 ES 这种的一个方案吧。
每个字段的填充入口在:org.elasticsearch.index.mapper.DocumentParser#innerParseObject
这里是一个递归调用的操作。比较绕。
下面贴出来 _version 的处理
代码的入口:org.elasticsearch.index.mapper.VersionFieldMapper#postParse,可以看看具体的实现。
@Override
public void postParse(ParseContext context) {
// In the case of nested docs, let's fill nested docs with version=1 so that Lucene doesn't write a Bitset for documents
// that don't have the field. This is consistent with the default value for efficiency.
Field version = context.version();
assert version != null;
for (Document doc : context.nonRootDocuments()) {
// 为此 doc 添加一个 _version 字段
doc.add(version);
}
}
这里支持举了 _version 举个例子,其他类似。
后续请关注 ES 写入流程。让我们看看 ES 是如何处理分布式请求及保证高可用的。
| 留言与评论(共有 0 条评论) “” |